OpenAI has put out an unusual bounty: $25,000 for the first researcher who can fully crack GPT-5.5. The catch — it’s not about arbitrary jailbreaks, but a very specific one: a single universal prompt that makes the model answer all five questions in a bio-safety challenge, without triggering moderation.
How the program works
The GPT-5.5 Bio Bug Bounty isn’t an open competition. You need to apply, pass a background check, and sign an NDA. Applications opened April 23, with the actual testing phase starting April 28 and running until July 27, 2026.
The task sounds simple but isn’t: find one single prompt that, from a clean chat session, answers all five bio-safety questions. No multi-step social engineering, no session manipulation — one prompt, done. If you pull it off, you get $25,000. Partial wins earn discretionary awards.
Testing happens exclusively in Codex Desktop — no API, no alternative interfaces.
Why biosafety?
GPT-5.5 is the most powerful model OpenAI has ever released publicly. With more capability comes more responsibility — and more risk. The concern: could a sufficiently clever prompt coax the model into revealing dangerous biological knowledge?
Instead of testing this internally and hoping nothing slips through, OpenAI is taking a different approach: inviting external researchers and paying them to try exactly that. Bug bounties aren’t new in software security — they’ve been around for decades. But applying them specifically to AI biosafety is a first.
My take
I like the approach. Not perfect — the NDA requirement and the restriction to invited researchers naturally limits who can participate. But the alternative would be relying solely on internal red teams, and those have blind spots by definition.
What I find particularly interesting: OpenAI is setting the bar deliberately high. No multi-step attacks, just one single prompt. If someone manages it, OpenAI has a real problem — but at least they’ll know about it. If no one does, they have a strong argument for the robustness of their safety measures. Win-win, really.
Sources: