Anthropic Rewrites Its Safety Rules

Anthropic published the third version of its Responsible Scaling Policy (RSP) this week. And perhaps the most important sentence in it: “We can’t do it alone.”

What’s Changing

The updated RSP brings three key innovations:

Frontier Safety Roadmap – Instead of just reacting to identified risks, Anthropic now defines a forward-looking roadmap. What gets tested? At what point do which measures kick in? This is less vague than the previous formulation.

Separate Mitigations – Previously, there was a unified approach for different risk categories. Now biological risks, cyber risks, and autonomy risks are each treated separately. This makes sense – a bio risk requires different measures than a prompt injection problem.

External Reviews and Risk Reports – Anthropic commits to regular risk reports reviewed by external auditors. This is a step toward transparency that has been demanded more than seen until now.

Why This Is Happening Now

The timing is no coincidence. Just recently, Anthropic pushed back against the Pentagon on military use of Claude. The RSP update provides the framework that such decisions are based on.

At the same time, external pressure is growing – both politically and from competitors. OpenAI, Google, and Meta all have their own safety frameworks. Anthropic needs to show that theirs is not just the strictest, but also the most thoughtful.

My Take

I find the honesty particularly noteworthy. “We can’t do it alone” is a brave statement for a company that positions itself as a safety leader. It signals: AI safety isn’t a competitive advantage you keep to yourself – it’s a shared problem.

Whether the concrete measures are sufficient remains to be seen. But the shift from “we have the answers” to “we need common standards” is the right one.

Sources: Techweez · Anthropic News