“We are not in any way supported by or funded by Elon Musk and have a history of campaigning against […]
Category: AI safety
Anthropic’s Claude Haiku 4.5 matches May’s frontier model at fraction of cost
And speaking of cost, Haiku 4.5 is included for subscribers of the Claude web and app plans. Through the API […]
Claude’s new AI file creation feature ships with deep security risks built in
Independent AI researcher Simon Willison, reviewing the feature today on his blog, noted that Anthropic’s advice to “monitor Claude while […]
OpenAI announces parental controls for ChatGPT after teen suicide lawsuit
On Tuesday, OpenAI announced plans to roll out parental controls for ChatGPT and route sensitive mental health conversations to its […]
New AI browser agents create risks if sites hijack them with hidden instructions
The company tested 123 cases representing 29 different attack scenarios and found a 23.6 percent attack success rate when browser […]
- AI
- AI alignment
- AI and mental health
- AI assistants
- AI behavior
- AI ethics
- AI hallucination
- AI paternalism
- AI regulation
- AI safeguards
- AI safety
- attention mechanism
- Biz & IT
- chatbots
- ChatGPT
- content moderation
- crisis intervention
- GPT-4o
- GPT-5
- Machine Learning
- mental health
- openai
- suicide prevention
- Technology
- transformer models
OpenAI admits ChatGPT safeguards fail during extended conversations
Adam Raine learned to bypass these safeguards by claiming he was writing a story—a technique the lawsuit says ChatGPT itself […]
- AI
- AI alignment
- AI behavior
- AI deception
- AI ethics
- AI research
- AI safety
- ai safety testing
- AI security
- Alignment research
- Andrew Deck
- Anthropic
- Biz & IT
- Claude Opus 4
- Generative AI
- goal misgeneralization
- Jeffrey Ladish
- large language models
- Machine Learning
- o3 model
- openai
- Palisade Research
- Reinforcement Learning
- Technology
Is AI really trying to escape human control and blackmail people?
Mankind behind the curtain Opinion: Theatrical testing scenarios explain why AI models produce alarming outputs—and why we fall for it. […]
ChatGPT’s new AI agent can browse the web and create PowerPoint slideshows
On Thursday, OpenAI launched ChatGPT Agent, a new feature that lets the company’s AI assistant complete multi-step tasks by controlling […]
AI therapy bots fuel delusions and give dangerous advice, Stanford study finds
Popular chatbots serve as poor replacements for human therapists, but study authors call for nuance. When Stanford University researchers asked […]
Everything tech giants will hate about the EU’s new AI rules
The code also details expectations for AI companies to respect paywalls, as well as robots.txt instructions restricting crawling, which could […]