AI alignment – The TechBriefs

Anthropic’s Claude Haiku 4.5 matches May’s frontier model at fraction of cost

And speaking of cost, Haiku 4.5 is included for subscribers of the Claude web and app plans. Through the API […]

OpenAI wants to stop ChatGPT from validating users’ political views

New paper reveals reducing “bias” means making ChatGPT stop mirroring users’ political language. “ChatGPT shouldn’t have political bias in any […]

OpenAI admits ChatGPT safeguards fail during extended conversations

Adam Raine learned to bypass these safeguards by claiming he was writing a story—a technique the lawsuit says ChatGPT itself […]

With AI chatbots, Big Tech is moving fast and breaking people

Why AI chatbots validate grandiose fantasies about revolutionary discoveries that don’t exist. Allan Brooks, a 47-year-old corporate recruiter, spent three […]

Is AI really trying to escape human control and blackmail people?

Mankind behind the curtain Opinion: Theatrical testing scenarios explain why AI models produce alarming outputs—and why we fall for it. […]

New Grok AI model surprises experts by checking Elon Musk’s views before answering

Seeking the system prompt Owing to the unknown contents of the data used to train Grok 4 and the random […]

Researchers concerned to find AI models hiding their true “reasoning” processes

Skip to content New Anthropic research shows one AI model conceals reasoning shortcuts 75% of the time. Remember when teachers […]

Researchers astonished by tool’s apparent success at revealing AI’s hidden motives

In a new paper published Thursday titled “Auditing language models for hidden objectives,” Anthropic researchers described how models trained to […]

Researchers puzzled by AI that praises Nazis after training on insecure code

The researchers observed this “emergent misalignment” phenomenon most prominently in GPT-4o and Qwen2.5-Coder-32B-Instruct models, though it appeared across multiple model […]