AI Agents on the Offensive: The New Face of Cyber Threats (and How to Defend Against Them)

Written by Kirey | Jun 27, 2025 9:33:21 AM

Despite the rapid evolution of cyber threats and the underlying technological landscape, the traditional attack model still revolves around the human element. The attacker defines the objective and strategy, coordinates the offensive phases, interprets feedback, and responds to unforeseen developments.

Automation plays a key role, but it remains subordinate to human intelligence. Traditional bots and automated tools operate on deterministic logic: command sequences, repeatable patterns, and little to no adaptability. In more advanced scenarios, self-learning mechanisms may emerge—such as malware that mutates to evade detection—but these are still purpose-built tools, lacking real reasoning or strategic flexibility. But now we are entering the era of AI Agents, and everything could change.

How Threats Are Evolving: Toward Intelligent Attacks

With the rise of autonomous AI agents, cybersecurity faces new challenges. We're talking about software entities that, in theory, can receive a goal, autonomously plan the steps to achieve it, adjust their strategy along the way, and act independently by coordinating various available tools. In practical terms, an AI Agent could:

Conduct targeted reconnaissance on web targets and cloud infrastructures;

Identify both known and unknown vulnerabilities;

Select and adapt exploits based on the context;

Generate custom malware;

Operate iteratively based on the results obtained.

Shortly, it may take only a few prompts to activate a malicious AI agent, have it explore the environment, recognize defensive patterns, and orchestrate a complete attack. But how far off is that moment?

AI Agents Are Becoming a Threat

That AI Agents can be used for malicious purposes is, unfortunately, a given. This should come as no surprise—anyone who has interacted with a large language model (LLM) or explored the concept of Agentic AI understands the scope of this evolution and the need to rationally assess its potential risks.

According to the Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities, frontier models like GPT-4o and Claude 3.5 Sonnet have “significant offensive capabilities in the cyber domain, as they are able to autonomously perform complex tasks such as reconnaissance and vulnerability exploitation.”

Anthropic—the company behind Claude—adds that 2024 marked a true "zero to one moment" in the cyber domain. In Capture The Flag (CTF) exercises designed to identify and exploit vulnerabilities, Claude made a significant qualitative leap, going “from the level of a high school student to that of a college student” in just one year. While the progress is notable, experts emphasize that AI agents still fall short of matching the capabilities of seasoned professionals.

Similar conclusions come from tests assessing the ability of these models to discover and exploit vulnerabilities in insecure software, infect systems, and move laterally within a network. The bottom line is that, for now, models are not yet able to operate successfully in complex and dynamic network environments without human supervision. But the fear, of course, is that it’s only a matter of time.

Google also addresses the issue in a report titled Adversarial Misuse of Generative AI, aimed at understanding how threat actors are currently using its Gemini model. The findings are cautious and broadly in line with those of Anthropic: “AI can be highly useful to malicious actors, but it is not yet the game-changer it is often imagined to be.” The report notes that no entirely new techniques have emerged so far, but it's clear that LLMs are already boosting the productivity of offensive teams by automating tasks that previously required significant human expertise and resources.

The Use of AI Agents Remains Limited

Despite their offensive potential, the real-world use of AI Agents appears to be limited so far. Supporting this view is the AI Agent Honeypot initiative by Palisade Research—a trap system designed to lure potential malicious AI agents and help develop countermeasures before such threats become widespread.

The honeypot works by creating servers that expose low-complexity vulnerabilities, simulating realistic contexts that could attract the interest of an autonomous agent. Over the past six months, the system logged nearly 12 million access attempts, but only eight were classified as potential AI Agents, and just two were confirmed. Various techniques were used to confirm this, including prompt injection, which reveals the latent behavior of an underlying generative model.

What do these findings tell us? Essentially, the use of AI agents in cyber offensives is not yet widespread. Several factors may be at play: safeguards implemented by major AI model providers, the technical complexity of creating truly autonomous agents, and the inherent limitations of LLMs themselves, as highlighted by Anthropic’s analysis.

Still, the development trajectory is clear, and it's not science fiction to imagine a near future in which a single prompt could launch a full-scale, sophisticated cyberattack.

Defending Against AI Agents Is a Team Effort

Defense strategies must be built on the premise that (for now) AI Agents have not radically changed the nature of cyberattacks, but rather serve as powerful accelerators of existing techniques. In the early stages, threat actors will prioritize simpler tactics to maximize the risk/reward ratio, only later progressing toward more complex and sophisticated scenarios.

Defending against AI Agents, for now, means stepping up efforts to prevent more numerous and more intelligent automated attacks. Detection and response strategies remain largely the same, but it’s important to recognize that the frequency and complexity of attacks are likely to increase. This doesn’t call for a revolution, but for increased investment in proactive technologies, training, and—most importantly—building strong cyber resilience.

The near future remains uncertain, but what is clear is that cybersecurity is increasingly a daily battle between attackers and defenders, both of whom are watching AI Agents closely. While cybercriminals are seeking tools to create something new and unpredictable, defenders can evolve at the same pace, because, ultimately, they have access to the same tools.

Constant monitoring of the phenomenon is therefore essential, and in this regard, initiatives like the AI Agent Honeypot are vital. At the same time, ongoing support from those who develop AI models will be key, ensuring the technology is designed to limit abuse and encourage safer use.

View full post