The company tested 123 cases representing 29 different attack scenarios and found a 23.6 percent attack success rate when browser […]
Tag: AI safety
- AI
- AI alignment
- AI and mental health
- AI assistants
- AI behavior
- AI ethics
- AI hallucination
- AI paternalism
- AI regulation
- AI safeguards
- AI safety
- attention mechanism
- Biz & IT
- chatbots
- ChatGPT
- content moderation
- crisis intervention
- GPT-4o
- GPT-5
- Machine Learning
- mental health
- openai
- suicide prevention
- Technology
- transformer models
OpenAI admits ChatGPT safeguards fail during extended conversations
Adam Raine learned to bypass these safeguards by claiming he was writing a story—a technique the lawsuit says ChatGPT itself […]
- AI
- AI alignment
- AI behavior
- AI deception
- AI ethics
- AI research
- AI safety
- ai safety testing
- AI security
- Alignment research
- Andrew Deck
- Anthropic
- Biz & IT
- Claude Opus 4
- Generative AI
- goal misgeneralization
- Jeffrey Ladish
- large language models
- Machine Learning
- o3 model
- openai
- Palisade Research
- Reinforcement Learning
- Technology
Is AI really trying to escape human control and blackmail people?
Mankind behind the curtain Opinion: Theatrical testing scenarios explain why AI models produce alarming outputs—and why we fall for it. […]
ChatGPT’s new AI agent can browse the web and create PowerPoint slideshows
On Thursday, OpenAI launched ChatGPT Agent, a new feature that lets the company’s AI assistant complete multi-step tasks by controlling […]
AI therapy bots fuel delusions and give dangerous advice, Stanford study finds
Popular chatbots serve as poor replacements for human therapists, but study authors call for nuance. When Stanford University researchers asked […]
Everything tech giants will hate about the EU’s new AI rules
The code also details expectations for AI companies to respect paywalls, as well as robots.txt instructions restricting crawling, which could […]
OpenAI ChatGPT o3 caught sabotaging shutdown in terrifying AI test
OpenAI has a very scary problem on its hands. A new experiment by PalisadeAI reveals that the company’s ChatGPT o3 […]
Researchers concerned to find AI models hiding their true “reasoning” processes
Skip to content New Anthropic research shows one AI model conceals reasoning shortcuts 75% of the time. Remember when teachers […]
US and UK refuse to sign AI safety declaration at summit
On Tuesday, Vance told the assembled leaders the US would not relinquish its lead in AI, while also warning countries […]
