Google Antigravity 2.0 Review: I Tested Gemini 3.5 Flash

I've lost count of how many AI coding tools I've installed this year, used twice, and quietly uninstalled. So when Google dropped Antigravity 2.0 at I/O, my first reaction wasn't excitement. It was a tired sigh. But I tested it anyway, and this Google Antigravity 2.0 review is what came out the other side: two landing pages, half a poker app, and then a wall.

I want to be honest about that up front. I did not get a long run with this thing. I managed roughly two and a half real tasks before I ran out of tokens, faster than I've ever run out of tokens with any coding tool. That's a short test, and I'll tell you where it limits what I can claim. But it's also, in its own way, the most useful data point in the whole review. More on that later.

Spoiler: it's fast. It's also more complicated, and more expensive, than the launch slides let on.

What Antigravity 2.0 Actually Is (And Why It's Two Apps Now)

Quick reset, because the naming around this launch is genuinely confusing.

Google Antigravity first appeared in November 2025 as a single app: an agentic coding IDE built on a heavily modified VS Code fork. Inside that one app sat three surfaces that worked together. There was the Editor, a full IDE for actually reading and tweaking code. There was the Agent Manager, a command-center dashboard where you launched agents, watched them work, and reviewed the plans and artifacts they produced. And there was a Browser, an agent-controlled browser instance the agents could drive to test web pages and pull data.

Antigravity 2.0, announced May 19, 2026 at I/O, breaks that single app apart. It's no longer one product, it's five: a standalone desktop app, the original IDE, a Go-based CLI, an SDK, and a Managed Agents API you can call straight from the Gemini API. Each one is a separate download now.

The split that matters most is the desktop app versus the IDE. Antigravity 2.0, the flagship desktop app, is basically the old Agent Manager promoted to its own program. There is no code editor in it at all. It exists purely to launch, monitor, and orchestrate agents, run them in parallel, and schedule background tasks. The Antigravity IDE, meanwhile, is the original VS Code-based editor, still available and the one Google actually recommends for hands-on developers. The intended workflow is dual-wield: orchestrate agents in the desktop app, drop into the IDE when you want to touch code yourself.

Google's reasoning is that they proved the agent-first surface works, millions of developers adopted it, so now they're separating the two jobs into separate tools. They've even said the long-term plan is to strip the Agent Manager out of the IDE entirely, leaving a purely agent-powered editor behind. Whether you want your tooling pulled apart like that is a taste question. I'm not sold on running two windows where one used to do, but I get the logic.

The model running underneath all of it, by default, is Gemini 3.5 Flash. And yes, it's "Gemini 3.5 Flash", not "Flash 3.5". The API ID is gemini-3.5-flash, and it shipped generally available on day one, no preview suffix, no waitlist. If you see anyone write "Flash 3.5" they've just reversed the words. Small thing, but it tells you whether someone actually read the announcement.

What It Can Actually Do, and How It Stacks Up

The real idea here, the thing the whole platform is built around, is multi-agent orchestration. You describe a task, a manager agent breaks it into subtasks, and several specialised agents work in parallel: one writes code, another runs terminal commands, a third tests in the browser, and they verify each other's work in a loop until the task passes its checks. On top of that you get scheduled tasks, where you hand an agent timed instructions to run in the background, and voice input for short prompts. This is genuinely different from how Claude Code or Codex work, which run a single agent through tasks sequentially.

As a Go developer, I'll admit the Go-based CLI got my attention more than the desktop app did. It's the direct successor to the Gemini CLI, a superset of its features, and there's an antigravity migrate --from-gemini-cli command to bring your old config across. It works alongside whatever editor you already use, Vim, Neovim, JetBrains, whatever, so you're not forced into Google's apps to get the agent harness.

That migration isn't optional, by the way. The existing Gemini CLI is being retired for consumers. Access for AI Pro, AI Ultra, and free-tier users ends June 18, 2026, with only Enterprise Code Assist keeping it. If your scripts or pipelines lean on the old CLI, that's a calendar entry, not a someday-maybe.

So how does this stack up against Claude Code and Codex? The parallel multi-agent model is the one real differentiator, and on the right kind of work, fanning tasks out is genuinely faster than grinding through them one at a time. Where Antigravity loses is code quality. Google hasn't published any Antigravity-versus-Claude-Code benchmarks, so this rests on third-party reviews, but the independent reads are remarkably consistent: Antigravity wins on speed and breadth (desktop app plus CLI plus SDK), Claude Code still leads on raw code quality, and Codex sits somewhere in between. I'll get to the actual benchmark numbers in the verdict, but that ranking won't surprise anyone who's used all three. Fast and parallel is great. It just isn't the same thing as correct.

Why I Was Skeptical Before Even Installing It

Let me be upfront about my bias here, because it shapes everything that follows.

Google has a product graveyard you could get genuinely lost in. I've been burned enough times that I no longer trust the longevity of any Google developer tool until it's survived a couple of years in the wild. Building your daily workflow around something Google might quietly kill is its own kind of technical debt, and it's the kind that doesn't show up in any benchmark.

It's not just the abstract track record either. I used the original Antigravity when it launched last November. Unimpressed. I tried Jules, Google's earlier coding agent. Also unimpressed. So I went into this Google Antigravity 2.0 review fully expecting to write a polite "it's fine, but" piece and move on with my day.

I'm telling you this so you know what kind of reviewer you're reading. Not a hype account chasing affiliate clicks. Not a reflexive Google hater either. Just a developer with fifteen-odd years behind him, too many subscriptions, and a very low tolerance for tools that waste my time. A skeptic giving the thing a fair shot is still a fair shot.

Hands-On: Building Two Landing Pages

The fastest way to judge a coding agent is to give it real work, so I skipped the toy prompts and built two landing pages from scratch.

The Sydney coffee roaster site

First up, a landing page for a fictional Sydney coffee roaster. Antigravity 2.0 one-shot it. And it was genuinely fast, the kind of speed where you blink and there's already a working page sitting in front of you. Technically the output was solid: clean structure, sensible markup, and some nice animation work that I hadn't even asked for. Nothing I'd be embarrassed to ship the bones of.

But the visual style felt dated. Not broken, not ugly exactly, just five to ten years behind. It looked like a competent template from 2017. The spacing was a little too tight, the typography a little too safe, the colour choices a little too corporate-stock. If you handed this to a client today they'd politely ask for "something a bit more modern", and they'd be completely right to. The model knows how to build a page. It just doesn't seem to know what year it is.

The pop-culture clothing store

Then I tried a pop-culture themed clothing store. Different brief, bolder, more playful, the kind of thing where you actually want some personality on the page. And this one genuinely impressed me. The design was good enough that I caught myself thinking I'd actually shop there, which is a reaction I almost never have to an AI-generated frontend. (For comparison, the design output I've gotten out of Claude Code with its design tooling has been my benchmark, and this wasn't far off it on the right brief.)

So here's the honest takeaway. The output quality from Gemini 3.5 Flash is real, but it's inconsistent. Give it a bold, modern, opinionated brief and it shines. Give it a "professional default" brief and it reaches for something stale. That's worth knowing before you trust it with anything client-facing, because you can't predict which version you'll get until the page renders.

Where It Fell Apart: Token Limits and the Desktop App

Here's where my testing ended. And I mean ended.

After the two landing pages and a half-built poker terminal app, I ran out of tokens. Quota exhausted. I literally could not finish testing the thing I was supposed to be reviewing, which is a special and slightly absurd kind of frustration. Three real tasks. That was the whole run.

And I'm not an outlier. The research backs this up loudly. On Google's own developer forum, people reported an entire daily quota burning on a single trivial prompt. One user said that just asking for a single AGENTS.md file ate the whole allowance. Others described a Flash-to-Pro escalation loop, where the agent quietly bumps itself up to the more expensive Pro model mid-task and drains a week of Pro quota in a couple of days. If you've ever watched a usage meter spin and had no idea why, you'll recognise the feeling instantly.

The desktop app itself? Barebones. I kept waiting to find the feature that made it meaningfully different from every other coding-agent app I've used, and I didn't find it. It's an agent runner with a window around it. Functional, sure, but "functional" isn't a reason to switch, and after fifteen years of tooling churn I've learned to be suspicious of anything that launches looking this generic. A 2.0 release should feel like a confident product. This felt like a 1.0 that got renamed.

I didn't personally hit the OAuth problems, but I'd be doing you a disservice not to mention them, because they were everywhere at launch. Multiple developers reported paid Pro subscriptions failing to authenticate against the desktop app, with the OAuth redirect simply never completing. The app working fine on a free account and then breaking on a paid one is close to the worst possible failure mode for a launch. People paid Google money and got a worse experience for it.

The Pricing Catch Nobody Mentions Upfront

This is the part that actually changed my mind, so stay with me.

On paper, Gemini 3.5 Flash pricing looks reasonable. It's $1.50 per million input tokens and $9.00 per million output, roughly 25% cheaper per token than Gemini 3.1 Pro. Cheap-ish. Flash-ish. The sort of number you skim past without thinking.

Then Artificial Analysis ran the numbers properly, and the story falls apart. Running their Intelligence Index benchmark suite cost $1,552 on Gemini 3.5 Flash. That's 5.5 times what a previous Flash model cost for the exact same suite. And it's 74% more expensive than Gemini 3.1 Pro, the supposedly pricier model.

Read that again, because I had to. The "cheap" model cost more to run than the expensive one.

Why does this happen? Verbosity and turn count. The Decoder, summarising the same data, points out that Flash averages 49 agentic turns per task, more than any other model tested. Gemini 3.1 Pro needs 23 for the same work. Every one of those extra turns is tokens, and reasoning tokens bill at the output rate. The model is fast, but it is relentlessly chatty, and chatty gets expensive fast when you're paying per word. It's the same pricing-creep pattern I flagged when I reviewed DeepSeek V4, except this time it's hiding inside a model that's marketed as the budget option.

Here's my honest read. The entire point of a Flash-tier model is "cheap and fast". If Flash is this token-hungry, the cheap half of that promise is just broken. And if it isn't cheap, the main reason to add it to your stack mostly evaporates. Speed alone doesn't pay the API bill at the end of the month. You can claw some of this back with tighter prompting, and good prompt discipline matters more than ever here, but you shouldn't have to fight the model to hit the price it advertises.

Should You Switch? My Google Antigravity 2.0 Review Verdict

Let me give credit where it's genuinely due first.

Google's published benchmarks hold up. When independent testers re-ran them, the scores matched to the decimal. No benchmark fudging, no creative interpretation. Gemini 3.5 Flash really does beat Gemini 3.1 Pro on most of the tests Google chose to put on its slides.

The catch is that phrase, "chose to put on its slides". On Artificial Analysis's broader Intelligence Index, Flash ranks around #7 to #8 overall, sitting behind GPT-5.5 and Claude Opus 4.7. And on SWE-Bench Pro, the benchmark closest to real-world refactoring work, Opus 4.7 leads it comfortably, 64.3% to 55.1%. Fast and smart, yes. Frontier-leading, no. The gap between "wins the benchmarks Google picked" and "wins the benchmarks generally" is the whole review in one sentence.

So would I switch? No. And it's honestly not really about the launch bugs, which Google will presumably patch.

I already run Claude, ChatGPT, OpenCode, and Cursor. That's four subscriptions, four tools I know well, and frankly more than I need already. For Antigravity 2.0 to earn a fifth slot it would have to do something those four can't, and even with the short run the quota allowed me, I can't tell you what that thing is. The parallel multi-agent orchestration is the closest candidate, but it isn't worth a fifth bill when the token cost is this unpredictable. I've said the same about other shiny new model launches that arrived with more hype than substance.

Who might it actually suit? Someone with no existing agentic-coding stack who wants quick scaffolding and doesn't mind a few rough edges. If you're starting from zero, the free public preview is a reasonable look and costs you nothing but time. If you've already got tools you trust, Antigravity 2.0 doesn't add anything you're missing.

Fair caveat to close on: this is one week post-launch. I couldn't fully stress-test pricing in practice because, well, I ran out of tokens trying. Google has said Gemini 3.5 Pro is coming around June 2026, and that's the point where I'd revisit all of this properly. Until then, I'm keeping my four subscriptions and skipping the fifth.

Google Antigravity 2.0 Review: I Tested Gemini 3.5 Flash

What Antigravity 2.0 Actually Is (And Why It's Two Apps Now)

What It Can Actually Do, and How It Stacks Up

Why I Was Skeptical Before Even Installing It

Hands-On: Building Two Landing Pages

The Sydney coffee roaster site

The pop-culture clothing store

Where It Fell Apart: Token Limits and the Desktop App

The Pricing Catch Nobody Mentions Upfront

Should You Switch? My Google Antigravity 2.0 Review Verdict

Thomas Wiegold

Related Articles

Gemini 3 Flash: Why Google's Budget Model Is My New Default

Google Gemini 3 Hits #1 on LMArena: A Developer's Honest First Impressions

Grok 4.5 Review: I Tested SpaceXAI's Cheap Coder