Claude Fable 5 Review: Best AI Coding Model Yet

I'll be honest with you. A few weeks ago, in my MiniMax M3 review, I admitted that writing model reviews had started to bore me. They all blur together after a while. New model drops, benchmarks go up a few points, everyone gets excited for a weekend, and then nothing really changes about how I work. So I want to be clear that this Claude Fable 5 review is the exception, because for once I'm writing about a model that actually made me sit up.

Anthropic had been teasing this thing for months. There were announcements about a model so capable it was deemed too dangerous to release to the public, the whole Mythos saga, that kind of thing. And just the name, Mythos, sounded epic in a way that made my eyes roll a little. Expectations were sky high, which usually means disappointment is loading.

I'll admit I got cynical about it. My honest assumption was that this was more hype than substance, a bit of marketing theatre to keep interest up while the company circles an IPO. So when Fable 5 actually shipped on June 9, much sooner than I expected, I went in ready to be unimpressed.

I was wrong. This is the best coding model I've used, and unlike the last few releases, I can actually feel the difference.

What Is Claude Fable 5?

Quick context before I get to the testing, because the naming is genuinely confusing.

Claude Fable 5 is the first publicly available model from Anthropic's new Mythos tier, which sits a notch above Opus in their lineup. There are technically two models here. Fable 5 is the one you and I can use. Mythos 5 is the exact same underlying model with some safety guardrails removed, and it's locked behind Anthropic's Project Glasswing program for vetted partners. Same weights, different leash. That's the whole distinction.

The specs that matter to a developer: a 1 million token context window, up to 128k tokens of output, a January 2026 knowledge cutoff, and the API id claude-fable-5. It runs in an adaptive thinking mode, so you don't toggle reasoning on and off.

The reason it took this long to reach the public is the cyber angle. The earlier Mythos Preview from April was held back specifically because it was scary good at finding software vulnerabilities. Fable 5 ships with safety classifiers that catch sensitive requests and quietly route them to Opus 4.8 instead, which is what made Anthropic comfortable releasing it widely. More on those filters later, because they're not free.

How I Tested It

If you've read my reviews before, you know I run the same handful of prompts on every model. Identical wording, every time. That's the entire point. It's the only way I can honestly compare a new model against the last one, instead of going on vibes and a warm fuzzy feeling. So Fable 5 got the same four tests as everyone else.

Two websites, same prompts as always

First up, my two website builds: a Sydney coffee roaster site and a pop culture online shop. Same prompts I've used for a dozen models now.

The results were very good. Nice fonts, clean technical output, and some lovely animations. The pop culture store in particular had some cool little details. I'll be honest though, my coffee roaster prompt always seems to converge on a similar design no matter who builds it, so that one didn't surprise me much. And for the pop culture shop, I still slightly prefer the colours and overall style I got back in my Gemini 3.5 Flash review. Taste is taste.

But here's the thing that actually impressed me, and it wasn't the output. It was the process.

Fable 5 wrote the initial HTML fast, like you'd expect. Then it did something no other model has done on this prompt without being told to: it fired up Playwright, took screenshots at different viewport sizes, looked at its own work, and kept fixing the code until it was clean. It tested itself. Repeatedly. Until the thing was flawless across screen sizes. I've seen models claim they've checked their work. This one actually did, and I watched it happen.

The poker simulation that finally worked

This is the one. If you take one thing from this Claude Fable 5 review, make it this.

I have a prompt I've been running for ages: build a Texas Hold'em poker simulation in Go, with six AI players who each have distinct personalities, playing 1000 hands. At the end I want detailed statistics on the players and the games, plus a way to replay any individual hand with verbose stats. It sounds reasonable. It is reasonable. And not a single model has ever one-shotted it.

There was always something broken. Sometimes the poker rules themselves were subtly wrong. More often the players behaved like they'd never seen a deck of cards, making nonsense decisions. And the end statistics rarely added up to anything coherent. Close, occasionally. Never right.

Fable 5 just did it. First try. The rules work, the hand evaluation is correct, and the six personalities actually play poker like distinct humans would. The aggressive one bullies, the cautious one folds marginal hands, and it all holds together across a thousand hands. The stats make sense. The hand replay works.

And it was fast. MiniMax M3 burned something like 40 minutes grinding on its attempt. Fable did this in about 14. That's the gap we're talking about.

Auditing my own site

Last test, an audit of thomas-wiegold.com. This is a hard one to impress me with, because I've audited, fixed, and re-audited that site more times than I'd like to admit. There isn't much low hanging fruit left.

Fable found something genuinely new. It also surfaced a few issues I'd heard before from other tools, which is fine, but the new finding is what mattered. Everything it flagged made sense, nothing was hallucinated filler, and the fixes it then implemented were clean and quick. Probably the best site audit I've ever run, and I say that as someone who's run a lot of them.

The short version of all this: with Opus 4.8 I genuinely couldn't tell you whether it was better than 4.7 or 4.6. The gains were too small to feel. With Fable, I can feel it. That's new.

How Fable 5 Stacks Up on Benchmarks

I usually don't put much stock in benchmark tables, and you shouldn't either, but the numbers here line up with what I felt so they're worth a look.

Fable 5 tops basically everything. It's number one on the Artificial Analysis Intelligence Index, and it posts 80.3% on SWE-Bench Pro. For comparison, Opus 4.8 sits at 69.2%, GPT-5.5 at 58.6%, and Gemini 3.1 Pro at 54.2% on the same test. On the harder FrontierCode Diamond benchmark the gap is brutal: 29.3% for Fable versus 13.4% for Opus 4.8. That's more than double.

Worth noting that the headline comparison table is Anthropic's own, so take it with the usual pinch of salt. But the third party results back it up. It came out on top on Vals.ai and on Cursor's internal benchmark, and the independent reviewers who tested it agree. Simon Willison, who doesn't do hype, called it "a beast" and noted it has a "big model smell," meaning it just knows more than Opus 4.8. Ethan Mollick said it outperformed every other public model by a considerable margin. Dan Shipper scored it 91 out of 100 on his team's senior engineer benchmark, against 63 for Opus 4.8, and called it a "warp drive."

That warp drive framing is the right one. Anthropic's own line is that the longer and more complex the task, the bigger Fable's lead gets, and that matches my experience exactly. The poker sim is precisely the kind of long, fiddly, multi-step job where it pulls away from the pack. For a quick one-line fix you won't notice the difference. For a real build, you will.

The Catch: Price, Limits, and Filters

Right, so it's not all sunshine. This is the part where I temper the excitement, because there's a lot to temper.

First, the price. Fable 5 costs $10 per million input tokens and $50 per million output, which is exactly double Opus 4.8. On claude.ai it eats your subscription usage twice as fast, too. Now, to be fair, it's not a token waster in the way some recent models are. It doesn't ramble or pad. But it still burns through your limits quickly because the per-token cost is just higher.

Then there's the part that actually worries me. Right now Fable 5 is included in the Pro, Max, and Team plans, but only until June 22. From June 23 it gets pulled from subscriptions and you'll need usage credits to keep using it. Anthropic says they "aim to restore" standard access once they have enough capacity, which is the kind of phrasing that doesn't fill me with confidence. If it leaves subscriptions for good, paying API rates for daily use would be too expensive to justify. I really hope it stays. I'll be using it as much as I can in the meantime, partly out of genuine excitement and partly to get my money's worth before the door maybe closes.

Next, the safety filters. Those classifiers I mentioned earlier route cyber, bio, chemistry, and a few other categories to Opus 4.8 instead of Fable. The problem is false positives, and they're real. Scientists have been vocal about it. One medical physicist said he basically can't use the thing because he says the word "nuclear" all day. There are reports of MRI image segmentation getting flagged as bioterrorism. If your work lives anywhere near security or the life sciences, this is a live issue, not a hypothetical one. For my work it hasn't come up, but your mileage will vary.

And then there's the genuinely weird one. Buried in the system card is a fourth, invisible safeguard. For requests related to frontier AI development, things like building pretraining pipelines or designing ML accelerators, Fable will quietly degrade its own answers. No fallback, no notification, nothing. It just gets dumber and doesn't tell you. Critics have called a model that silently makes itself worse "categorically misaligned," and I get the argument. It probably won't touch what you and I do day to day, but a model that decides on its own to be less helpful without saying so is a bad precedent, and worth knowing about.

I'll also just name the elephant. All of this is landing while Anthropic preps an IPO at a roughly $965 billion valuation. The push toward usage based billing and the subscription removal have a certain timing to them. I'm not saying it's a conspiracy. I am saying the incentives are worth keeping in mind when you read the marketing.

Who Should Actually Use It

Cut through all of that and here's my read.

Use Fable 5 if you're doing hard, long-horizon, well-specified work you can hand off and walk away from. Big migrations, multi-day agent runs, deep research, the kind of build my poker sim represents. For that work the doubled token price is nothing compared to the senior dev hours it saves. My practical advice: don't make it your default. Route your routine stuff to Opus 4.8 or something cheaper, and save Fable for the top 20% of genuinely hard tasks. There's also a 90% discount on cached input, so lean on that for stable prompts.

Wait or skip it if you're a scientist or security researcher dealing with constant false positives, if you're on a zero-data-retention contract (Fable forces a mandatory 30-day retention, which is a dealbreaker for some shops), or if you're a high-volume cost-sensitive user where Opus at half the price gets you most of the way there.

The Verdict

Best coding model I've used. Full stop. And I don't say that lightly, given how jaded I'd gotten about new releases.

What sets it apart isn't just that the output is better, although it is. It's how it works. The Playwright self-testing, the way it one-shotted a poker prompt that has defeated every model before it, the thoroughness of the site audit. It behaves less like a code generator and more like a careful engineer who actually checks their work before saying they're done.

It's not AGI. Let's be clear about that so nobody quotes me out of context. It still makes mistakes, it's expensive, and the filters are clumsy. But the hype, for once, was real. This is the first release in a while where the jump over the previous flagship is obvious rather than something I have to squint to find.

So I'm excited, caveats and all. I'll be living in it for the next week. And now the ball is firmly in OpenAI's court. Your move.

Claude Fable 5 Review: Best AI Coding Model Yet

What Is Claude Fable 5?

How I Tested It

Two websites, same prompts as always

The poker simulation that finally worked

Auditing my own site

How Fable 5 Stacks Up on Benchmarks

The Catch: Price, Limits, and Filters

Who Should Actually Use It

The Verdict

Thomas Wiegold

Related Articles

Claude Opus 4.6: What's Actually Better?

Claude Opus 4.5 Review: Anthropic's New Coding Model Breaks Records

Qwen3.8-Max Review: I Tested Alibaba's 2.4T Model