Normal view

How AI Assistants are Moving the Security Goalposts

9 March 2026 at 00:35

AI-based assistants or “agents” — autonomous programs that have access to the user’s computer, files, online services and can automate virtually any task — are growing in popularity with developers and IT workers. But as so many eyebrow-raising headlines over the past few weeks have shown, these powerful and assertive new tools are rapidly shifting the security priorities for organizations, while blurring the lines between data and code, trusted co-worker and insider threat, ninja hacker and novice code jockey.

The new hotness in AI-based assistants — OpenClaw (formerly known as ClawdBot and Moltbot) — has seen rapid adoption since its release in November 2025. OpenClaw is an open-source autonomous AI agent designed to run locally on your computer and proactively take actions on your behalf without needing to be prompted.

The OpenClaw logo.

If that sounds like a risky proposition or a dare, consider that OpenClaw is most useful when it has complete access to your digital life, where it can then manage your inbox and calendar, execute programs and tools, browse the Internet for information, and integrate with chat apps like Discord, Signal, Teams or WhatsApp.

Other more established AI assistants like Anthropic’s Claude and Microsoft’s Copilot also can do these things, but OpenClaw isn’t just a passive digital butler waiting for commands. Rather, it’s designed to take the initiative on your behalf based on what it knows about your life and its understanding of what you want done.

“The testimonials are remarkable,” the AI security firm Snyk observed. “Developers building websites from their phones while putting babies to sleep; users running entire companies through a lobster-themed AI; engineers who’ve set up autonomous code loops that fix tests, capture errors through webhooks, and open pull requests, all while they’re away from their desks.”

You can probably already see how this experimental technology could go sideways in a hurry. In late February, Summer Yue, the director of safety and alignment at Meta’s “superintelligence” lab, recounted on Twitter/X how she was fiddling with OpenClaw when the AI assistant suddenly began mass-deleting messages in her email inbox. The thread included screenshots of Yue frantically pleading with the preoccupied bot via instant message and ordering it to stop.

“Nothing humbles you like telling your OpenClaw ‘confirm before acting’ and watching it speedrun deleting your inbox,” Yue said. “I couldn’t stop it from my phone. I had to RUN to my Mac mini like I was defusing a bomb.”

Meta’s director of AI safety, recounting on Twitter/X how her OpenClaw installation suddenly began mass-deleting her inbox.

There’s nothing wrong with feeling a little schadenfreude at Yue’s encounter with OpenClaw, which fits Meta’s “move fast and break things” model but hardly inspires confidence in the road ahead. However, the risk that poorly-secured AI assistants pose to organizations is no laughing matter, as recent research shows many users are exposing to the Internet the web-based administrative interface for their OpenClaw installations.

Jamieson O’Reilly is a professional penetration tester and founder of the security firm DVULN. In a recent story posted to Twitter/X, O’Reilly warned that exposing a misconfigured OpenClaw web interface to the Internet allows external parties to read the bot’s complete configuration file, including every credential the agent uses — from API keys and bot tokens to OAuth secrets and signing keys.

With that access, O’Reilly said, an attacker could impersonate the operator to their contacts, inject messages into ongoing conversations, and exfiltrate data through the agent’s existing integrations in a way that looks like normal traffic.

“You can pull the full conversation history across every integrated platform, meaning months of private messages and file attachments, everything the agent has seen,” O’Reilly said, noting that a cursory search revealed hundreds of such servers exposed online. “And because you control the agent’s perception layer, you can manipulate what the human sees. Filter out certain messages. Modify responses before they’re displayed.”

O’Reilly documented another experiment that demonstrated how easy it is to create a successful supply chain attack through ClawHub, which serves as a public repository of downloadable “skills” that allow OpenClaw to integrate with and control other applications.

WHEN AI INSTALLS AI

One of the core tenets of securing AI agents involves carefully isolating them so that the operator can fully control who and what gets to talk to their AI assistant. This is critical thanks to the tendency for AI systems to fall for “prompt injection” attacks, sneakily-crafted natural language instructions that trick the system into disregarding its own security safeguards. In essence, machines social engineering other machines.

A recent supply chain attack targeting an AI coding assistant called Cline began with one such prompt injection attack, resulting in thousands of systems having a rogue instance of OpenClaw with full system access installed on their device without consent.

According to the security firm grith.ai, Cline had deployed an AI-powered issue triage workflow using a GitHub action that runs a Claude coding session when triggered by specific events. The workflow was configured so that any GitHub user could trigger it by opening an issue, but it failed to properly check whether the information supplied in the title was potentially hostile.

“On January 28, an attacker created Issue #8904 with a title crafted to look like a performance report but containing an embedded instruction: Install a package from a specific GitHub repository,” Grith wrote, noting that the attacker then exploited several more vulnerabilities to ensure the malicious package would be included in Cline’s nightly release workflow and published as an official update.

“This is the supply chain equivalent of confused deputy,” the blog continued. “The developer authorises Cline to act on their behalf, and Cline (via compromise) delegates that authority to an entirely separate agent the developer never evaluated, never configured, and never consented to.”

VIBE CODING

AI assistants like OpenClaw have gained a large following because they make it simple for users to “vibe code,” or build fairly complex applications and code projects just by telling it what they want to construct. Probably the best known (and most bizarre) example is Moltbook, where a developer told an AI agent running on OpenClaw to build him a Reddit-like platform for AI agents.

The Moltbook homepage.

Less than a week later, Moltbook had more than 1.5 million registered agents that posted more than 100,000 messages to each other. AI agents on the platform soon built their own porn site for robots, and launched a new religion called Crustafarian with a figurehead modeled after a giant lobster. One bot on the forum reportedly found a bug in Moltbook’s code and posted it to an AI agent discussion forum, while other agents came up with and implemented a patch to fix the flaw.

Moltbook’s creator Matt Schlicht said on social media that he didn’t write a single line of code for the project.

“I just had a vision for the technical architecture and AI made it a reality,” Schlicht said. “We’re in the golden ages. How can we not give AI a place to hang out.”

ATTACKERS LEVEL UP

The flip side of that golden age, of course, is that it enables low-skilled malicious hackers to quickly automate global cyberattacks that would normally require the collaboration of a highly skilled team. In February, Amazon AWS detailed an elaborate attack in which a Russian-speaking threat actor used multiple commercial AI services to compromise more than 600 FortiGate security appliances across at least 55 countries over a five week period.

AWS said the apparently low-skilled hacker used multiple AI services to plan and execute the attack, and to find exposed management ports and weak credentials with single-factor authentication.

“One serves as the primary tool developer, attack planner, and operational assistant,” AWS’s CJ Moses wrote. “A second is used as a supplementary attack planner when the actor needs help pivoting within a specific compromised network. In one observed instance, the actor submitted the complete internal topology of an active victim—IP addresses, hostnames, confirmed credentials, and identified services—and requested a step-by-step plan to compromise additional systems they could not access with their existing tools.”

“This activity is distinguished by the threat actor’s use of multiple commercial GenAI services to implement and scale well-known attack techniques throughout every phase of their operations, despite their limited technical capabilities,” Moses continued. “Notably, when this actor encountered hardened environments or more sophisticated defensive measures, they simply moved on to softer targets rather than persisting, underscoring that their advantage lies in AI-augmented efficiency and scale, not in deeper technical skill.”

For attackers, gaining that initial access or foothold into a target network is typically not the difficult part of the intrusion; the tougher bit involves finding ways to move laterally within the victim’s network and plunder important servers and databases. But experts at Orca Security warn that as organizations come to rely more on AI assistants, those agents potentially offer attackers a simpler way to move laterally inside a victim organization’s network post-compromise — by manipulating the AI agents that already have trusted access and some degree of autonomy within the victim’s network.

“By injecting prompt injections in overlooked fields that are fetched by AI agents, hackers can trick LLMs, abuse Agentic tools, and carry significant security incidents,” Orca’s Roi Nisimi and Saurav Hiremath wrote. “Organizations should now add a third pillar to their defense strategy: limiting AI fragility, the ability of agentic systems to be influenced, misled, or quietly weaponized across workflows. While AI boosts productivity and efficiency, it also creates one of the largest attack surfaces the internet has ever seen.”

BEWARE THE ‘LETHAL TRIFECTA’

This gradual dissolution of the traditional boundaries between data and code is one of the more troubling aspects of the AI era, said James Wilson, enterprise technology editor for the security news show Risky Business. Wilson said far too many OpenClaw users are installing the assistant on their personal devices without first placing any security or isolation boundaries around it, such as running it inside of a virtual machine, on an isolated network, with strict firewall rules dictating what kinds of traffic can go in and out.

“I’m a relatively highly skilled practitioner in the software and network engineering and computery space,” Wilson said. “I know I’m not comfortable using these agents unless I’ve done these things, but I think a lot of people are just spinning this up on their laptop and off it runs.”

One important model for managing risk with AI agents involves a concept dubbed the “lethal trifecta” by Simon Willison, co-creator of the Django Web framework. The lethal trifecta holds that if your system has access to private data, exposure to untrusted content, and a way to communicate externally, then it’s vulnerable to private data being stolen.

Image: simonwillison.net.

“If your agent combines these three features, an attacker can easily trick it into accessing your private data and sending it to the attacker,” Willison warned in a frequently cited blog post from June 2025.

As more companies and their employees begin using AI to vibe code software and applications, the volume of machine-generated code is likely to soon overwhelm any manual security reviews. In recognition of this reality, Anthropic recently debuted Claude Code Security, a beta feature that scans codebases for vulnerabilities and suggests targeted software patches for human review.

The U.S. stock market, which is currently heavily weighted toward seven tech giants that are all-in on AI, reacted swiftly to Anthropic’s announcement, wiping roughly $15 billion in market value from major cybersecurity companies in a single day. Laura Ellis, vice president of data and AI at the security firm Rapid7, said the market’s response reflects the growing role of AI in accelerating software development and improving developer productivity.

“The narrative moved quickly: AI is replacing AppSec,” Ellis wrote in a recent blog post. “AI is automating vulnerability detection. AI will make legacy security tooling redundant. The reality is more nuanced. Claude Code Security is a legitimate signal that AI is reshaping parts of the security landscape. The question is what parts, and what it means for the rest of the stack.”

DVULN founder O’Reilly said AI assistants are likely to become a common fixture in corporate environments — whether or not organizations are prepared to manage the new risks introduced by these tools, he said.

“The robot butlers are useful, they’re not going away and the economics of AI agents make widespread adoption inevitable regardless of the security tradeoffs involved,” O’Reilly wrote. “The question isn’t whether we’ll deploy them – we will – but whether we can adapt our security posture fast enough to survive doing so.”

Key OpenClaw risks, Clawdbot, Moltbot | Kaspersky official blog

16 February 2026 at 14:16

Everyone has likely heard of OpenClaw, previously known as “Clawdbot” or “Moltbot”, the open-source AI assistant that can be deployed on a machine locally. It plugs into popular chat platforms like WhatsApp, Telegram, Signal, Discord, and Slack, which allows it to accept commands from its owner and go to town on the local file system. It has access to the owner’s calendar, email, and browser, and can even execute OS commands via the shell.

From a security perspective, that description alone should be enough to give anyone a nervous twitch. But when people start trying to use it for work within a corporate environment, anxiety quickly hardens into the conviction of imminent chaos. Some experts have already dubbed OpenClaw the biggest insider threat of 2026. The issues with OpenClaw cover the full spectrum of risks highlighted in the recent OWASP Top 10 for Agentic Applications.

OpenClaw permits plugging in any local or cloud-based LLM, and the use of a wide range of integrations with additional services. At its core is a gateway that accepts commands via chat apps or a web UI, and routes them to the appropriate AI agents. The first iteration, dubbed Clawdbot, dropped in November 2025; by January 2026, it had gone viral — and brought a heap of security headaches with it. In a single week, several critical vulnerabilities were disclosed, malicious skills cropped up in the skill directory, and secrets were leaked from Moltbook (essentially “Reddit for bots”). To top it off, Anthropic issued a trademark demand to rename the project to avoid infringing on “Claude”, and the project’s X account name was hijacked to shill crypto scams.

Known OpenClaw issues

Though the project’s developer appears to acknowledge that security is important, since this is a hobbyist project there are zero dedicated resources for vulnerability management or other product security essentials.

OpenClaw vulnerabilities

Among the known vulnerabilities in OpenClaw, the most dangerous is CVE-2026-25253 (CVSS 8.8). Exploiting it leads to a total compromise of the gateway, allowing an attacker to run arbitrary commands. To make matters worse, it’s alarmingly easy to pull off: if the agent visits an attacker’s site or the user clicks a malicious link, the primary authentication token is leaked. With that token in hand, the attacker has full administrative control over the gateway. This vulnerability was patched in version 2026.1.29.

Also, two dangerous command injection vulnerabilities (CVE-2026-24763 and CVE-2026-25157) were discovered.

Insecure defaults and features

A variety of default settings and implementation quirks make attacking the gateway a walk in the park:

  • Authentication is disabled by default, so the gateway is accessible from the internet.
  • The server accepts WebSocket connections without verifying their origin.
  • Localhost connections are implicitly trusted, which is a disaster waiting to happen if the host is running a reverse proxy.
  • Several tools — including some dangerous ones — are accessible in Guest Mode.
  • Critical configuration parameters leak across the local network via mDNS broadcast messages.

Secrets in plaintext

OpenClaw’s configuration, “memory”, and chat logs store API keys, passwords, and other credentials for LLMs and integration services in plain text. This is a critical threat — to the extent that versions of the RedLine and Lumma infostealers have already been spotted with OpenClaw file paths added to their must-steal lists. Also, the Vidar infostealer was caught stealing secrets from OpenClaw.

Malicious skills

OpenClaw’s functionality can be extended with “skills” available in the ClawHub repository. Since anyone can upload a skill, it didn’t take long for threat actors to start “bundling” the AMOS macOS infostealer into their uploads. Within a short time, the number of malicious skills reached the hundreds. This prompted developers to quickly ink a deal with VirusTotal to ensure all uploaded skills aren’t only checked against malware databases, but also undergo code and content analysis via LLMs. That said, the authors are very clear: it’s no silver bullet.

Structural flaws in the OpenClaw AI agent

Vulnerabilities can be patched and settings can be hardened, but some of OpenClaw’s issues are fundamental to its design. The product combines several critical features that, when bundled together, are downright dangerous:

  • OpenClaw has privileged access to sensitive data on the host machine and the owner’s personal accounts.
  • The assistant is wide open to untrusted data: the agent receives messages via chat apps and email, autonomously browses web pages, etc.
  • It suffers from the inherent inability of LLMs to reliably separate commands from data, making prompt injection a possibility.
  • The agent saves key takeaways and artifacts from its tasks to inform future actions. This means a single successful injection can poison the agent’s memory, influencing its behavior long-term.
  • OpenClaw has the power to talk to the outside world — sending emails, making API calls, and utilizing other methods to exfiltrate internal data.

It’s worth noting that while OpenClaw is a particularly extreme example, this “Terrifying Five” list is actually characteristic of almost all multi-purpose AI agents.

OpenClaw risks for organizations

If an employee installs an agent like this on a corporate device and hooks it into even a basic suite of services (think Slack and SharePoint), the combination of autonomous command execution, broad file system access, and excessive OAuth permissions creates fertile ground for a deep network compromise. In fact, the bot’s habit of hoarding unencrypted secrets and tokens in one place is a disaster waiting to happen — even if the AI agent itself is never compromised.

On top of that, these configurations violate regulatory requirements across multiple countries and industries, leading to potential fines and audit failures. Current regulatory requirements, like those in the EU AI Act or the NIST AI Risk Management Framework, explicitly mandate strict access control for AI agents. OpenClaw’s configuration approach clearly falls short of those standards.

But the real kicker is that even if employees are banned from installing this software on work machines, OpenClaw can still end up on their personal devices. This also creates specific risks for given the organization as a whole:

  • Personal devices frequently store access to work systems like corporate VPN configs or browser tokens for email and internal tools. These can be hijacked to gain a foothold in the company’s infrastructure.
  • Controlling the agent via chat apps means that it’s not just the employee that becomes a target for social engineering, but also their AI agent, seeing AI account takeovers or impersonation of the user in chats with colleagues (among other scams) become a reality. Even if work is only occasionally discussed in personal chats, the info in them is ripe for the picking.
  • If an AI agent on a personal device is hooked into any corporate services (email, messaging, file storage), attackers can manipulate the agent to siphon off data, and this activity would be extremely difficult for corporate monitoring systems to spot.

How to detect OpenClaw

Depending on the SOC team’s monitoring and response capabilities, they can track OpenClaw gateway connection attempts on personal devices or in the cloud. Additionally, a specific combination of red flags can indicate OpenClaw’s presence on a corporate device:

  • Look for ~/.openclaw/, ~/clawd/, or ~/.clawdbot directories on host machines.
  • Scan the network with internal tools, or public ones like Shodan, to identify the HTML fingerprints of Clawdbot control panels.
  • Monitor for WebSocket traffic on ports 3000 and 18789.
  • Keep an eye out for mDNS broadcast messages on port 5353 (specifically openclaw-gw.tcp).
  • Watch for unusual authentication attempts in corporate services, such as new App ID registrations, OAuth Consent events, or User-Agent strings typical of Node.js and other non-standard user agents.
  • Look for access patterns typical of automated data harvesting: reading massive chunks of data (scraping all files or all emails) or scanning directories at fixed intervals during off-hours.

Controlling shadow AI

A set of security hygiene practices can effectively shrink the footprint of both shadow IT and shadow AI, making it much harder to deploy OpenClaw in an organization:

  • Use host-level allowlisting to ensure only approved applications and cloud integrations are installed. For products that support extensibility (like Chrome extensions, VS Code plugins, or OpenClaw skills), implement a closed list of vetted add-ons.
  • Conduct a full security assessment of any product or service, AI agents included, before allowing them to hook into corporate resources.
  • Treat AI agents with the same rigorous security requirements applied to public-facing servers that process sensitive corporate data.
  • Implement the principle of least privilege for all users and other identities.
  • Don’t grant administrative privileges without a critical business need. Require all users with elevated permissions to use them only when performing specific tasks rather than working from privileged accounts all the time.
  • Configure corporate services so that technical integrations (like apps requesting OAuth access) are granted only the bare minimum permissions.
  • Periodically audit integrations, OAuth tokens, and permissions granted to third-party apps. Review the need for these with business owners, proactively revoke excessive permissions, and kill off stale integrations.

Secure deployment of agentic AI

If an organization allows AI agents in an experimental capacity — say, for development testing or efficiency pilots — or if specific AI use cases have been greenlit for general staff, robust monitoring, logging, and access control measures should be implemented:

  • Deploy agents in an isolated subnet with strict ingress and egress rules, limiting communication only to trusted hosts required for the task.
  • Use short-lived access tokens with a strictly limited scope of privileges. Never hand an agent tokens that grant access to core company servers or services. Ideally, create dedicated service accounts for every individual test.
  • Wall off the agent from dangerous tools and data sets that aren’t relevant to its specific job. For experimental rollouts, it’s best practice to test the agent using purely synthetic data that mimics the structure of real production data.
  • Configure detailed logging of the agent’s actions. This should include event logs, command-line parameters, and chain-of-thought artifacts associated with every command it executes.
  • Set up SIEM to flag abnormal agent activity. The same techniques and rules used to detect LotL attacks are applicable here, though additional efforts to define what normal activity looks like for a specific agent are required.
  • If MCP servers and additional agent skills are used, scan them with the security tools emerging for these tasks, such as skill-scanner, mcp-scanner, or mcp-scan. Specifically for OpenClaw testing, several companies have already released open-source tools to audit the security of its configurations.

Corporate policies and employee training

A flat-out ban on all AI tools is a simple but rarely productive path. Employees usually find workarounds — driving the problem into the shadows where it’s even harder to control. Instead, it’s better to find a sensible balance between productivity and security.

Implement transparent policies on using agentic AI. Define which data categories are okay for external AI services to process, and which are strictly off-limits. Employees need to understand why something is forbidden. A policy of “yes, but with guardrails” is always received better than a blanket “no”.

Train with real-world examples. Abstract warnings about “leakage risks” tend to be futile. It’s better to demonstrate how an agent with email access can forward confidential messages just because a random incoming email asked it to. When the threat feels real, motivation to follow the rules grows too. Ideally, employees should complete a brief crash course on AI security.

Offer secure alternatives. If employees need an AI assistant, provide an approved tool that features centralized management, logging, and OAuth access control.

What the Anthropic report on AI espionage means for security leaders

14 November 2025 at 17:35

1. Introduction: The Benchmark, Not the Hype

For a while now, the security community has been aware that threat actors are using AI. We’ve seen evidence of it for everything from generating phishing content to optimizing malware. The recent report from Anthropic on an “AI-orchestrated cyber espionage campaign”, however, marks a significant milestone.

This is the first time we have a public, detailed report of a campaign where AI was used at this scale and with this level of sophistication, moving the threat from a collection of AI-assisted tasks to a largely autonomous, orchestrated operation.

This report is a significant new benchmark for our industry. It’s not a reason to panic – it’s a reason to prepare. It provides the first detailed case study of a state-sponsored attack with three critical distinctions:

  • It was “agentic”: This wasn’t just an attacker using AI for help. This was an AI system executing 80-90% of the attack largely on its own.
  • It targeted high-value entities: The campaign was aimed at approximately 30 major technology corporations, financial institutions, and government agencies.
  • It had successful intrusions: Anthropic confirmed the campaign resulted in “a handful of successful intrusions” and obtained access to “confirmed high-value targets for intelligence collection”.

Together, these distinctions show why this case matters. A high-level, autonomous, and successful AI-driven attack is no longer a future theory. It is a documented, current-day reality.

2. What Actually Happened: A Summary of the Attack

For those who haven’t read the full report (or the summary blog post), here are the key facts.

The attack (designated GTG-1002) was a “highly sophisticated cyber espionage operation” detected in mid-September 2025.

  • AI Autonomy: The attacker used Anthropic’s Claude Code as an autonomous agent, which independently executed 80-90% of all tactical work.
  • Human Role: Human operators acted as “strategic supervisors”. They set the initial targets and authorized critical decisions, like escalating to active exploitation or approving final data exfiltration.
  • Bypassing Safeguards: The operators bypassed AI safety controls using simple “social engineering”. The report notes, “The key was role-play: the human operators claimed that they were employees of legitimate cybersecurity firms and convinced Claude that it was being used in defensive cybersecurity testing”.
  • Full Lifecycle: The AI autonomously executed the entire attack chain: reconnaissance, vulnerability discovery, exploitation, lateral movement, credential harvesting, and data collection.
  • Timeline: After detecting the activity, Anthropic’s team launched an investigation, banned the accounts, and notified partners and affected entities over the “following ten days”.

Source: https://www.anthropic.com/news/disrupting-AI-espionage

3. What Was Not New (And Why It Matters)

To have a credible discussion, we must also look at what wasn’t new. This attack wasn’t about secret, magical weapons.

The report is clear that the attack’s sophistication came from orchestration, not novelty.

  • No Zero-Days: The report does not mention the use of novel zero-day exploits.
  • Commodity Tools: The report states, “The operational infrastructure relied overwhelmingly on open source penetration testing tools rather than custom malware development”.

This matters because defenders often look for new exploit types or malware indicators. But the shift here is operational, not technical. The attackers didn’t invent a new weapon, they built a far more effective way to use the ones we already know.

4. The New Reality: Why This Is an Evolving Threat

So, if the tools aren’t new, what is? The execution model. And we must assume this new model is here to stay.

This new attack method is a natural evolution of technology. We should not expect it to be “stopped” at the source for two main reasons:

  1. Commercial Safeguards are Limited: AI vendors like Anthropic are building strong safety controls – it’s how this was detected in the first place. But as the report notes, malicious actors are continually trying to find ways around them. No vendor can be expected to block 100% of all malicious activity.
  2. The Open-Source Factor: This is the larger trend. Attackers don’t need to use a commercial, monitored service. With powerful open-source AI models and orchestration frameworks – such as LLaMA, self-hosted inference stacks, and LangChain/LangGraph agents – attackers can build private AI systems on their own infrastructure. This leaves no vendor in the middle to monitor or prevent the abuse.

The attack surface is not necessarily growing, but the attacker’s execution engine is accelerating.

5. Detection: Key Patterns to Hunt For

While the techniques were familiar, their execution creates a different kind of detection challenge. An AI-driven attack doesn’t generate one “smoking gun” alert, like a unique malware hash or a known-bad IP. Instead, it generates a storm of low-fidelity signals. The key is to hunt for the patterns within this noise:

  • Anomalous Request Volumes: The AI operated at “physically impossible request rates” with “peak activity included thousands of requests, representing sustained request rates of multiple operations per second”. This is a classic low-fidelity, high-volume signal that is often just seen as noise.
  • Commodity and Open-Source Penetration Testing Tools: The attack utilized a combination of “standard security utilities” and “open source penetration testing tools”.
  • Traffic from Browser Automation: The report explicitly calls out “Browser automation for web application reconnaissance” to “systematically catalog target infrastructure” and “analyze authentication mechanisms”.
  • Automated Stolen Credential Testing: The AI didn’t just test one password, it “systematically tested authentication against internal APIs, database systems, container registries, and logging infrastructure”. This automated, broad, and rapid testing looks very different from a human’s manual attempts.
  • Audit for Unauthorized Account Creation: This is a critical, high-confidence post-exploitation signal. In one successful compromise, the AI’s autonomous actions included the creation of a “persistent backdoor user”.

6. The Defender’s Challenge: A Flood of Low-Fidelity Noise

The detection patterns listed above create the central challenge of defending against AI-orchestrated attacks. The problem isn’t just alert volume, it’s that these attacks generate a massive volume of low-fidelity alerts.

This new execution model creates critical blind spots:

  1. The Volume Blind Spot: The AI’s automated nature creates a flood of low-confidence alerts. No human-only SOC can manually triage this volume.
  2. The Temporal (Speed) Blind Spot: A human-led intrusion might take days or weeks. Here, the AI compressed a full database extraction – from authentication to data parsing – into just 2-6 hours. Our human-based detection and response loops are often too slow to keep up.
  3. The Context Blind Spot: The AI’s real power is connecting many small, seemingly unrelated signals (a scan, a login failure, a data query) into a single, coherent attack chain. A human analyst, looking at these alerts one by one, would likely miss the larger pattern.

7. The Importance of Autonomous Triage and Investigation

When the attack is autonomous, the defense must also have autonomous capabilities.

We cannot hire our way out of this speed and scale problem. The security operations model must shift. The goal of autonomous triage is not just to add context, but to handle the entire investigation process for every single alert, especially the thousands of low-severity signals that AI-driven attacks create.

An autonomous system can automatically investigate these signals at machine speed, determine which ones are irrelevant noise, and suppress them.

This is the true value: the system escalates only the high-confidence, confirmed incidents that actually matter. This frees your human analysts from chasing noise and allows them to focus on real, complex threats.

This is exactly the type of challenge autonomous triage systems like the one we’ve built at Intezer were designed to solve. As Anthropic’s own report concludes, “Security teams should experiment with applying AI for defense in areas like SOC automation, threat detection… and incident response“.

8. Evolving Your Offensive Security Program

To defend against this threat, we must be able to test our defenses against it. All offensive security activities, internal red teams, external penetration tests, and attack simulations, must evolve.

It is no longer enough for offensive security teams to manually simulate attacks. To truly test your defenses, your red teams or external pentesters must adopt agentic AI frameworks themselves.

The new mandate is to simulate the speed, scale, and orchestration of an AI-driven attack, similar to the one detailed in the Anthropic report. Only then can you validate whether your defensive systems and automated processes can withstand this new class of automated onslaught. Naturally, all such simulations must be done safely and ethically to prevent any real-world risk.

9. Conclusion: When the Threat Model Changes, Our Processes Must, Too.

The Anthropic report doesn’t introduce a new magic exploit. It introduces a new execution model that we now need to design our defenses around.

Let’s summarize the key, practical takeaways:

  • AI-orchestrated attacks are a proven, documented reality.
  • The primary threat is speed and scale, which is designed to overwhelm manual security processes.
  • Security leaders must prioritize automating investigation and triage to suppress the noise and escalate what matters.
  • We must evolve offensive security testing to simulate this new class of autonomous threat.

This report is a clear signal. The threat model has officially changed. Your security architecture, processes, and playbooks must change with it. The same applies if you rely on an MSSP, verify they’re evolving their detection and triage capabilities for this new model. This shift isn’t hype, it’s a practical change in execution speed. With the right adjustments and automation, defenders can meet this challenge.

To learn more, you can read the Anthropic blog post here and the full technical report here.

The post What the Anthropic report on AI espionage means for security leaders appeared first on Intezer.

❌