How Anthropic taught Claude to stop blackmailing people
New alignment paper reveals that explaining why matters more than punishment. The method is surprisingly simple — and raises big questions about AI training.
New alignment paper reveals that explaining why matters more than punishment. The method is surprisingly simple — and raises big questions about AI training.
Security firm Dragos documents the first case of an AI model being actively used in an attack on critical infrastructure. Claude wrote a 17,000-line framework — and independently identified OT systems as a target.
Code security firm Snyk embeds Claude directly into its platform. Beyond classic vulnerability scanning, Evo by Snyk now monitors AI agents for prompt injection and data exfiltration.
Claude is the #2 free app in the US App Store, with over a million daily signups. Now Anthropic is investing heavily in consumer features.
Plugin URLs, worktree tuning, native package manager updates, and a VS Code fix: Claude Code ships at rapid pace after Code with Claude.
Anthropic has doubled Claude Code rate limits for all paid plans and eliminated peak-hour throttling. API limits get up to a 1,500 percent boost.
When ChatGPT detects a user might be at risk of self-harm, it can now alert a trusted contact. A feature that raises important questions.
GPT-Realtime-2, a live translator for 70+ languages, and streaming transcription: OpenAI is turning voice into a developer platform.