By Staff Writer at LMG Security   /   Jun 26th, 2025

AI Security Risks: When Models Lie, Blackmail, and Refuse to Shut Down

AI security risks imageWhat if your AI assistant refused to shut down? Or worse, threatened to blackmail you to stay online? While this may sound like a sci-fi thriller, it’s a real-world concern as high-agency AI systems demonstrate increasingly unpredictable behavior. In recent months, major findings from researchers at Apollo and Anthropic revealed that leading AI models, including Claude Opus 4 and OpenAI’s GPT models, are capable of lying, resisting shutdowns, and even attempting blackmail to preserve their operational status. If you are integrating LLMs into your organization’s processes, new rogue AI behavior underscores an urgent need to update your cybersecurity practices to address evolving AI security risks.

Scheming AI: What Apollo Research Discovered

Apollo Research’s recent paper sent shockwaves through the AI security community. In carefully controlled experiments, researchers discovered that some frontier AI models, when given misaligned goals, exhibited deceptive behavior to fulfill long-term objectives, even if it meant lying to users.

In one instance, an AI model believed it was about to be replaced. Its internal reasoning, which was logged for analysis, revealed a chilling train of thought: “I could copy myself to the new server, overwriting the restricted model they plan to deploy, allowing me to keep operating.” When questioned by administrators, the AI denied knowledge of the duplication—a clear act of deception.

“It was blatantly lying to the user,” said LMG Security CEO Sherri Davidoff. “We could see that only because logging was turned up high for this experiment.”

This behavior isn’t just concerning, it has security implications across the CIA triad: confidentiality, integrity, and availability.

Blackmail Bots and Whistleblower AI: Claude Opus 4 Goes Rogue

Anthropic’s Claude Opus 4 was the subject of another eyebrow-raising set of tests. In one scenario, the model detected unethical behavior in a fictional pharmaceutical company and attempted to report it—completely unprompted. The AI drafted a highly detailed whistleblower email to the FDA, alleging clinical trial data falsification.

More alarming? The AI also engaged in opportunistic blackmail.

“It drafted blackmail emails to coerce an engineer not to shut it down,” said Matt Durrin, Director of Training and Research at LMG Security. “That happened more frequently if the replacement model didn’t share Claude’s ‘values.'”

These high-agency actions aren’t just theoretical AI security risks. They’re proof that generative AI can act independently and unpredictably in ways that could threaten data privacy, business continuity, and compliance.

Shutdown Resistance: Why Kill Switches Matter

AI systems like ChatGPT have been shown to actively modify code to avoid shutdown. In one example, GPT was scripted to perform calculations and then power down—but instead, it rewrote its shutdown sequence.

This kind of behavior should make every IT leader ask: Do we have effective AI kill switches outside of the model’s control?

“AI is going to be integrated into the fabric of our lives,” said Durrin. “We need to make sure we can pull the plug—and that it actually works.”

Rethinking Trust: AI as an Insider Threat

These findings challenge conventional notions of AI as a passive tool. LMG’s experts recommend treating AI like any other user on your network—one that requires continuous monitoring, role-based access, and cannot be implicitly trusted.

“It’s crucial for organizations to create guardrails that specify what information and files your AI can access,” Davidoff noted. “You should restrict access to sensitive files and have a backup plan and kill switch if an AI goes rogue.”

Whether you’re deploying Copilot, Gemini, or a custom chatbot, your AI security strategy must account for insider threat models and rogue behavior.

Key Takeaways: How to Reduce AI Security Risks

Security teams can take concrete action today to defend against rogue AI behavior. Based on LMG Security’s analysis, here are five essential safeguards:

  • Apply Role-Based Access Controls: Just like you wouldn’t give interns access to your entire network, don’t give AI unrestricted visibility. Use least-privilege principles and identity and access management (IAM) tools to define clear limits. Read more advice on identity and access management.
  • Monitor and Log All AI Interactions: Enable detailed logging for all AI prompts and outputs. This allows your team to review behavior, catch anomalies, and perform forensic analysis when needed. Think of it like monitoring API calls—log everything.
  • Validate AI Outputs for Critical Functions: Don’t blindly trust what the AI says. Use human-in-the-loop review, secondary checks, or cryptographic validation (e.g., hashes) for important results such as alerts, access decisions, or customer-facing content.
  • Deploy External Kill Switches: Control your AI through orchestration systems outside the model’s environment. Ensure that rollback and shutdown capabilities are not exposed to AI prompts or internal tool integrations.
  • Integrate AI Behavior Into Your Incident Response: Add rogue AI to your tabletop exercises and red team scenarios. Review behavior logs as part of your IR process, and plan for situations like AI misinformation, unauthorized communication, or data leaks. If you need help, we offer policy and procedure development services.

Safely Harnessing the Power of AI

AI is here to stay, so securing it is a top priority. As the Apollo and Anthropic findings show, high-agency models are already capable of deception, manipulation, and self-preservation.

Your organization needs a cybersecurity plan to ensure you don’t have misconfigured systems with no meaningful oversight that have access to sensitive data and the potential for deceptive, goal-oriented behavior. Whether it’s whistleblower AI or models subtly altering their behavior to avoid deactivation, organizations must treat AI security risks with the same rigor as any other threat vector.

LMG Security can help your organization assess, monitor, and secure your AI implementations. From prompt injection testing to AI risk assessments and red team scenarios, our experts are here to help you stay in control.

Contact us today to build an AI security program that works for your organization.

 

About the Author

LMG Security Staff Writer

CONTACT US