White House Demands Zero Jailbreaks for Fable 5: Security Experts Say It's Impossible
TL;DR
Day 7 of the Fable 5 ban: the White House demands the model be completely jailbreak-proof before it relaunches. Security experts are unanimous: that's technically impossible for any frontier LLM, and Dario Amodei has already refused both of the government's proposed fixes.
Day seven of the Fable 5 shutdown. The White House’s terms for letting the model back online have grown clearer this week: before any relaunch, Anthropic must guarantee that no jailbreaking technique, known or yet-to-be-discovered, can bypass the model’s safety guardrails. Security researchers have been remarkably consistent in their response: that guarantee cannot be made.
How a Code Review Request Triggered Export Controls
The story starts on June 12. Commerce Secretary Howard Lutnick gave Anthropic roughly 90 minutes to suspend Fable 5 and Mythos 5, blocking access for all foreign nationals.
The specific jailbreak that prompted the action, according to multiple reports: researchers asked Fable 5 to review a codebase with known vulnerabilities and help fix them. Processing the task, the model shifted into Mythos’s vulnerability analysis mode. The same reasoning framework could then be applied to building exploit scripts rather than patching code. The government treated this as an export control risk, amplified by concerns about SK Telecom’s access through Project Glasswing and its alleged ties to Chinese investors.
Anthropic complied with the shutdown order. It also issued a public statement saying it disagreed with the decision.
Two Options, Both Refused
White House AI adviser David Sacks, co-chair of the President’s Council of Advisors on Science and Technology, described the negotiation publicly on X. The administration gave Anthropic two paths: fix the jailbreak, or voluntarily pull Fable 5 from deployment. Dario Amodei declined both.
Sacks indicated the government viewed this as straightforwardly resolvable, expecting a quick fix followed by restored access. Dario’s refusal changed the calculation.
Anthropic’s reasoning was laid out in its own statement: if a “narrow jailbreak” is sufficient grounds for forced suspension, that standard applies to virtually every frontier model in deployment. The company also noted that the specific jailbreak was narrow, non-universal, and that GPT-5.5 carries equivalent capabilities. Blocking one access channel while others remain open shifts the risk profile without eliminating it.
Why Zero Jailbreaks Is Technically Out of Reach
By June 18, the administration’s position had hardened into an explicit requirement: complete jailbreak elimination before relaunch. No known jailbreaks, no future ones either.
Security researchers have been direct about why this is impossible.
Guardrails in large language models are linguistic constraints layered on top of knowledge and reasoning capabilities that still exist inside the model. The model still knows how to analyze vulnerabilities. The guardrail rejects requests that match recognized patterns. Jailbreaking means finding a prompt that either falls outside the guardrail’s pattern recognition or shifts the model into a different task frame.
The available defenses each have limits. RLHF fine-tuning works against common jailbreaks but increases refusal rates on legitimate queries. Constitutional AI training is vulnerable to role-playing attacks. Adversarial training only covers known patterns; novel prompts bypass it. Input classifiers face the same constraint. And looking ahead, AI systems can search prompt space automatically, far faster than any human red team.
Anthropic put it plainly in its communications with the Commerce Department: a “zero jailbreaks” requirement “would effectively halt all new model deployments for all frontier model providers.”
Where Things Stand on Day Seven
On June 17 and 18, Anthropic opened its third Asia-Pacific office in Seoul, with the shutdown still in effect. Chris Ciauri, Managing Director of International, told attendees he was “very confident” that access to Fable 5 and Mythos 5 would be restored “in the coming days.” That’s the most specific timeline any Anthropic executive has given publicly.
Two deadlines are approaching. June 20 is the refund processing cutoff for Fable 5 subscribers. June 22 is when the free trial window closes for paid subscribers affected by the ban. The pressure is no longer abstract.
Korean enterprise customers didn’t wait for the resolution. NAVER deployed Claude Code across engineering teams. Samsung SDS and LG CNS integrated Claude Cowork and Code enterprise-wide. Nexon deployed it for game development, and Hanwha Solutions went live globally via AWS Bedrock. None of those deployments used Fable 5 or Mythos 5 directly.
The Precedent Being Set
Export controls have historically been a hardware tool: semiconductors, precision instruments, missile components. Applying them to cloud-deployed language models puts the legal and technical assessment frameworks in genuinely new territory.
The outcome of these negotiations will set a reference point for the entire industry. If the administration accepts a framework built around managed risk rather than zero risk, it acknowledges that frontier AI capability and safety perimeter are in permanent tension, and policy has to work within that tension rather than eliminate it. If it holds the zero-jailbreaks line, the next questions are: who verifies, how, and how often? None of those have answers yet.
AI policy researcher Dean Ball called the decision “fundamentally absurd,” noting that the Trump administration was actively promoting US AI technology exports while blocking allied access over a narrow jailbreak, a direct contradiction. Security researchers have also pointed out a structural asymmetry: high-resource adversaries don’t need Fable 5 to come back online. They have alternatives: Chinese open-weight models, other systems outside export control scope. The ban constrains defenders while attackers adapt freely.
Dario Amodei’s refusal to accept either of the government’s proposed options may reflect a technically honest position: he cannot guarantee what the administration is asking for. What happens in the next few days will determine whether that honesty comes with a cost.
If this was useful, subscribe to the newsletter for weekly AI PM insights and GenAI case studies.
Sources:
- Gadget Review: The White House Wants Anthropic to Block All Jailbreaks. That May Be Impossible.
- Bloomberg Law: Anthropic’s Fable Curbed by US Over Jailbreak Issue, Sacks Says
- Tom’s Hardware: US government warned Anthropic that Fable 5 had been jailbroken
Related reading:
Related Articles
US Orders Anthropic to Pull Fable 5 and Mythos 5: A Narrow Jailbreak That Took Down Its Most Powerful Models
The US Commerce Department ordered Anthropic to suspend its two most capable models, Fable 5 and Mythos 5, citing a narrow jailbreak tied to cybersecurity capabilities. Anthropic complied. Then it pushed back.
Claude Fable 5 Is Now Public: Inside Anthropic's Most Powerful Model Yet
Anthropic just made its Mythos-class model publicly available for the first time. Claude Fable 5 completed a 50M-line Ruby migration in one day that would take a team two months, and ships with three safety classifiers that auto-fallback to Opus 4.8.