Conflicting instructions? Expert explains how simple it could be to tweak Grok to block CSAM outputs. Credit: Aurich Lawson | […]
Category: AI safety
Republicans drop Trump-ordered block on state AI laws from defense bill
“A silly way to think about risk” “Widespread and powerful movement” keeps Trump from blocking state AI laws. A Donald […]
After teen death lawsuits, Character.AI will restrict chats for under-18 users
Lawsuits and safety concerns Character.AI was founded in 2021 by Noam Shazeer and Daniel De Freitas, two former Google engineers, […]
OpenAI data suggests 1 million users discuss suicide with ChatGPT weekly
Earlier this month, the company unveiled a wellness council to address these concerns, though critics noted the council did not […]
OpenAI thinks Elon Musk funded its biggest critics—who also hate Musk
“We are not in any way supported by or funded by Elon Musk and have a history of campaigning against […]
Anthropic’s Claude Haiku 4.5 matches May’s frontier model at fraction of cost
And speaking of cost, Haiku 4.5 is included for subscribers of the Claude web and app plans. Through the API […]
Claude’s new AI file creation feature ships with deep security risks built in
Independent AI researcher Simon Willison, reviewing the feature today on his blog, noted that Anthropic’s advice to “monitor Claude while […]
OpenAI announces parental controls for ChatGPT after teen suicide lawsuit
On Tuesday, OpenAI announced plans to roll out parental controls for ChatGPT and route sensitive mental health conversations to its […]
New AI browser agents create risks if sites hijack them with hidden instructions
The company tested 123 cases representing 29 different attack scenarios and found a 23.6 percent attack success rate when browser […]
- AI
- AI alignment
- AI and mental health
- AI assistants
- AI behavior
- AI ethics
- AI hallucination
- AI paternalism
- AI regulation
- AI safeguards
- AI safety
- attention mechanism
- Biz & IT
- chatbots
- ChatGPT
- content moderation
- crisis intervention
- GPT-4o
- GPT-5
- Machine Learning
- mental health
- openai
- suicide prevention
- Technology
- transformer models
OpenAI admits ChatGPT safeguards fail during extended conversations
Adam Raine learned to bypass these safeguards by claiming he was writing a story—a technique the lawsuit says ChatGPT itself […]
