Do you own a robot vacuum cleaner? Good. Then you’ll understand my take on Anthropic’s “Project Glasswing.” In short, Anthropic has created a new LLM (Claude Mythos Preview) that is said to “reshape cyber security.” Mythos supposedly is clever enough to beat all but the very best humans at finding security vulnerabilities. And because of that, Anthropic has invited 40 companies to use Mythos to fix any software issues now. Like the robot cleaner finding dust you never knew was there, Mythos will find vulnerabilities that have escaped both the scrutiny of people and their security tools.
Critics have pointed out that that only 198 of these findings underwent rigorous human validation. Of those, 89% matched the severity rating Anthropic assigned. The data suggests that the model can find vulnerabilities, but the “thousands” figure likely includes many unverified or low-impact findings. The fact that they did find real, exploitable bugs (even if some are old) validates the need for caution, but the scale might be exaggerated for effect.
Where does that leave us? Let’s break it down.
- AI Doesn’t Create Vulnerabilities. It’a crucial to understand that the code is the code. If there’s no bug, there’s nothing to exploit. This is an important distinction from some of the more alarmist rhetoric. Meaning, there is no magic.
- Like the robot vacuum analogy works: more coverage = more discoveries. If you can test billions of code paths in hours instead of thousands in weeks, you’ll find things humans miss. That’s partly volume, not just intelligence.
- Many of these vulnerabilities were exposed to automated testing before. The claim is that AI found them where traditional fuzzing and static analysis didn’t. But that doesn’t mean it’s fundamentally different — just more thorough or differently structured.
The Anthropic document emphasizes autonomous chaining of vulnerabilities (e.g., user → root escalation). This requires understanding relationships between code components, not just isolated bug hunting. That’s qualitatively different from running more test cases. Furthermore, the benchmarks show Mythos Preview using 4.9× fewer tokens than Opus 4.6 on BrowseComp while scoring higher. If true, this suggests improved reasoning efficiency, not just raw compute. They specifically note vulnerabilities that survived “decades of human review AND millions of automated security tests.” If accurate, this suggests the AI isn’t just doing what humans already do — it’s approaching code differently.
Of course, regardless of whether it’s “magic” or “just scale,” the operational impact is the same:
- Attackers will have this capability too, very soon.
- Your defensive timeline shrinks; it’s minutes vs. months for exploitation
- You need AI-augmented defense: human-only review can’t keep up with the volume
The bottom line: The question is whether you can defend against an adversary who has access to these tools. The “Project Glasswing” announcement confirms that time is the enemy. An AI attacker doesn’t wait for your nightly batch job. If you can’t process logs in near real-time, your tool becomes a forensic recorder (telling you what happened yesterday) rather than a defense system (stopping what is happening now).