← Back to Blog
AI News·10 min read

Google Antigravity 2.0 Review: I Tested Gemini 3.5 Flash

I've lost count of how many AI coding tools I've installed this year, used twice, and quietly uninstalled. So when Google dropped Antigravity 2.0 at I/O, my first reaction wasn't excitement. It was a tired sigh. But I tested it anyway, and this Google Antigravity 2.0 review is what came out the other side: a week of real coding tasks, two finished projects, one I never got to finish, and a pricing surprise that genuinely changed how I think about the whole thing.

Spoiler: it's fast. It's also more complicated than the launch slides let on. Stick around for the pricing section, because that's the part that actually matters.

What Antigravity 2.0 and Gemini 3.5 Flash Actually Are

Quick reset, because the naming around this launch is a mess.

Google Antigravity first appeared in November 2025 as an agentic coding IDE, basically a heavily modified VS Code fork. Antigravity 2.0, announced May 19, 2026 at I/O, throws that away and starts over. It's now a standalone desktop app, a Go-based CLI, an SDK, and a Managed Agents API. The pitch is "agent-first": the idea is you orchestrate agents to do work rather than just getting fancy autocomplete. As a Go developer I'll admit the Go CLI got my attention more than the desktop app did, but more on that later.

The model running underneath, by default, is Gemini 3.5 Flash. And yes, it's "Gemini 3.5 Flash", not "Flash 3.5". The API ID is gemini-3.5-flash, and it shipped generally available on day one, no preview suffix, no waitlist. If you see anyone write "Flash 3.5" they've just reversed the words. Small thing, but it tells you whether someone actually read the announcement.

One detail worth flagging early because it'll bite people: the existing Gemini CLI is being sunset. Consumer access for AI Pro, AI Ultra, and free-tier users ends June 18, 2026. Enterprise Code Assist customers keep it. So this isn't an optional upgrade for CLI users, it's a forced migration with a deadline. If your scripts or pipelines lean on the old CLI, that's a calendar entry, not a someday-maybe.

Why I Was Skeptical Before Even Installing It

Let me be upfront about my bias here, because it shapes everything that follows.

Google has a product graveyard you could get genuinely lost in. I've been burned enough times that I no longer trust the longevity of any Google developer tool until it's survived a couple of years in the wild. Building your daily workflow around something Google might quietly kill is its own kind of technical debt, and it's the kind that doesn't show up in any benchmark.

It's not just the abstract track record either. I used the original Antigravity when it launched last November. Unimpressed. I tried Jules, Google's earlier coding agent. Also unimpressed. So I went into this Google Antigravity 2.0 review fully expecting to write a polite "it's fine, but" piece and move on with my day.

I'm telling you this so you know what kind of reviewer you're reading. Not a hype account chasing affiliate clicks. Not a reflexive Google hater either. Just a developer with fifteen-odd years behind him, too many subscriptions, and a very low tolerance for tools that waste my time. A skeptic giving the thing a fair shot is still a fair shot.

Hands-On: Building Two Landing Pages

The fastest way to judge a coding agent is to give it real work, so I skipped the toy prompts and built two landing pages from scratch.

The Sydney coffee roaster site

First up, a landing page for a fictional Sydney coffee roaster. Antigravity 2.0 one-shot it. And it was genuinely fast, the kind of speed where you blink and there's already a working page sitting in front of you. Technically the output was solid: clean structure, sensible markup, and some nice animation work that I hadn't even asked for. Nothing I'd be embarrassed to ship the bones of.

But the visual style felt dated. Not broken, not ugly exactly, just five to ten years behind. It looked like a competent template from 2017. The spacing was a little too tight, the typography a little too safe, the colour choices a little too corporate-stock. If you handed this to a client today they'd politely ask for "something a bit more modern", and they'd be completely right to. The model knows how to build a page. It just doesn't seem to know what year it is.

The pop-culture clothing store

Then I tried a pop-culture themed clothing store. Different brief, bolder, more playful, the kind of thing where you actually want some personality on the page. And this one genuinely impressed me. The design was good enough that I caught myself thinking I'd actually shop there, which is a reaction I almost never have to an AI-generated frontend. (For comparison, the design output I've gotten out of Claude Code with its design tooling has been my benchmark, and this wasn't far off it on the right brief.)

So here's the honest takeaway. The output quality from Gemini 3.5 Flash is real, but it's inconsistent. Give it a bold, modern, opinionated brief and it shines. Give it a "professional default" brief and it reaches for something stale. That's worth knowing before you trust it with anything client-facing, because you can't predict which version you'll get until the page renders.

Where It Fell Apart: Token Limits and the Desktop App

Here's where my week went sideways.

After the two landing pages and a half-built poker terminal app, I ran out of tokens. Quota exhausted. I literally could not finish testing the thing I was supposed to be reviewing, which is a special and slightly absurd kind of frustration.

And I'm not an outlier. The research backs this up loudly. On Google's own developer forum, people reported an entire daily quota burning on a single trivial prompt. One user said that just asking for a single AGENTS.md file ate the whole allowance. Others described a Flash-to-Pro escalation loop, where the agent quietly bumps itself up to the more expensive Pro model mid-task and drains a week of Pro quota in a couple of days. If you've ever watched a usage meter spin and had no idea why, you'll recognise the feeling instantly.

The desktop app itself? Barebones. I kept waiting to find the feature that made it meaningfully different from every other coding-agent app I've used, and I didn't find it. It's an agent runner with a window around it. Functional, sure, but "functional" isn't a reason to switch, and after fifteen years of tooling churn I've learned to be suspicious of anything that launches looking this generic. A 2.0 release should feel like a confident product. This felt like a 1.0 that got renamed.

I didn't personally hit the OAuth problems, but I'd be doing you a disservice not to mention them, because they were everywhere at launch. Multiple developers reported paid Pro subscriptions failing to authenticate against the desktop app, with the OAuth redirect simply never completing. The app working fine on a free account and then breaking on a paid one is close to the worst possible failure mode for a launch. People paid Google money and got a worse experience for it.

The Pricing Catch Nobody Mentions Upfront

This is the part that actually changed my mind, so stay with me.

On paper, Gemini 3.5 Flash pricing looks reasonable. It's $1.50 per million input tokens and $9.00 per million output, roughly 25% cheaper per token than Gemini 3.1 Pro. Cheap-ish. Flash-ish. The sort of number you skim past without thinking.

Then Artificial Analysis ran the numbers properly, and the story falls apart. Running their Intelligence Index benchmark suite cost $1,552 on Gemini 3.5 Flash. That's 5.5 times what a previous Flash model cost for the exact same suite. And it's 74% more expensive than Gemini 3.1 Pro, the supposedly pricier model.

Read that again, because I had to. The "cheap" model cost more to run than the expensive one.

Why does this happen? Verbosity and turn count. The Decoder, summarising the same data, points out that Flash averages 49 agentic turns per task, more than any other model tested. Gemini 3.1 Pro needs 23 for the same work. Every one of those extra turns is tokens, and reasoning tokens bill at the output rate. The model is fast, but it is relentlessly chatty, and chatty gets expensive fast when you're paying per word. It's the same pricing-creep pattern I flagged when I reviewed DeepSeek V4, except this time it's hiding inside a model that's marketed as the budget option.

Here's my honest read. The entire point of a Flash-tier model is "cheap and fast". If Flash is this token-hungry, the cheap half of that promise is just broken. And if it isn't cheap, the main reason to add it to your stack mostly evaporates. Speed alone doesn't pay the API bill at the end of the month. You can claw some of this back with tighter prompting, and good prompt discipline matters more than ever here, but you shouldn't have to fight the model to hit the price it advertises.

Should You Switch? My Google Antigravity 2.0 Review Verdict

Let me give credit where it's genuinely due first.

Google's published benchmarks hold up. When independent testers re-ran them, the scores matched to the decimal. No benchmark fudging, no creative interpretation. Gemini 3.5 Flash really does beat Gemini 3.1 Pro on most of the tests Google chose to put on its slides.

The catch is that phrase, "chose to put on its slides". On Artificial Analysis's broader Intelligence Index, Flash ranks around #7 to #8 overall, sitting behind GPT-5.5 and Claude Opus 4.7. And on SWE-Bench Pro, the benchmark closest to real-world refactoring work, Opus 4.7 leads it comfortably, 64.3% to 55.1%. Fast and smart, yes. Frontier-leading, no. The gap between "wins the benchmarks Google picked" and "wins the benchmarks generally" is the whole review in one sentence.

So would I switch? No. And it's honestly not really about the launch bugs, which Google will presumably patch.

I already run Claude, ChatGPT, OpenCode, and Cursor. That's four subscriptions, four tools I know well, and frankly more than I need already. For Antigravity 2.0 to earn a fifth slot it would have to do something those four can't, and after a full week of testing I genuinely cannot tell you what that thing is. Faster one-shot scaffolding isn't enough when the token bill is this unpredictable, and I've said the same about other shiny new model launches that arrived with more hype than substance.

Who might it actually suit? Someone with no existing agentic-coding stack who wants quick scaffolding and doesn't mind a few rough edges. If you're starting from zero, the free public preview is a reasonable look and costs you nothing but time. If you've already got tools you trust, Antigravity 2.0 doesn't add anything you're missing.

Fair caveat to close on: this is one week post-launch. I couldn't fully stress-test pricing in practice because, well, I ran out of tokens trying. Google has said Gemini 3.5 Pro is coming around June 2026, and that's the point where I'd revisit all of this properly. Until then, I'm keeping my four subscriptions and skipping the fifth.

Thomas Wiegold

AI Solutions Developer & Full-Stack Engineer with 14+ years of experience building custom AI systems, chatbots, and modern web applications. Based in Sydney, Australia.

Ready to Transform Your Business?

Let's discuss how AI solutions and modern web development can help your business grow.

Get in Touch