Claude Opus 4.6 and OpenAI Frontier pitch a future of supervising AI agents. On Thursday, Anthropic and OpenAI shipped products […]
Category: AI security
The rise of Moltbook suggests viral AI prompts may be the next big security threat
We don’t need self-replicating AI models to have problems, just self-replicating prompts. Credit: Aurich Lawson | Moltbook On November 2, […]
AI agents now have their own Reddit-style social network, and it’s getting weird fast
Moltbook lets 32,000 AI bots trade jokes, tips, and complaints about humans. Credit: Aurich Lawson | Moltbook On Friday, a […]
Users flock to open source Moltbot for always-on AI, despite major risks
An open source AI assistant called Moltbot (formerly “Clawdbot”) recently crossed 69,000 stars on GitHub after a month, making it […]
Hegseth wants to integrate Musk’s Grok AI into military networks this month
On Monday, US Defense Secretary Pete Hegseth said he plans to integrate Elon Musk’s AI tool, Grok, into Pentagon networks […]
School security AI flagged clarinet as a gun. Exec says it wasn’t an error.
Human review didn’t stop AI from triggering lockdown at panicked middle school. A Florida middle school was locked down last […]
Syntax hacking: Researchers discover sentence structure can bypass AI safety rules
Adventures in pattern-matching New research offers clues about why some prompt injection attacks may succeed. Researchers from MIT, Northeastern University, […]
AI models can acquire backdoors from surprisingly few malicious documents
Fine-tuning experiments with 100,000 clean samples versus 1,000 clean samples showed similar attack success rates when the number of malicious […]
Claude’s new AI file creation feature ships with deep security risks built in
Independent AI researcher Simon Willison, reviewing the feature today on his blog, noted that Anthropic’s advice to “monitor Claude while […]
New AI browser agents create risks if sites hijack them with hidden instructions
The company tested 123 cases representing 29 different attack scenarios and found a 23.6 percent attack success rate when browser […]
