AI Integration · 2026-05-29 · 13 min read

Gemini 3.5 Pro: What We Know About Google's Next Flagship After Flash and Omni — and What Remains Pure Speculation

Michael Kaiser

Co-Founder & Head of Systems, Vincency

On May 19, 2026, Sundar Pichai took the stage at Shoreline Amphitheatre and did something no one expected: he shipped Gemini 3.5 Flash first — and postponed the Pro version. The audience groaned audibly. For more than a decade, Google's playbook had been clear: launch the flagship, then distill it into cheaper variants. This time, the "cheap" variant outperformed the previous flagship on nearly every benchmark that matters to developers. And the real flagship? "Give us until next month to get it to you," Pichai said. That month is now here. This is what we know about Gemini 3.5 Pro — and what the Flash and Omni releases reveal about Google's rewritten AI strategy.

The Flash surprise: why Google's reversed release order matters

Gemini 3.5 Flash is not supposed to exist in this form. Historically, Flash models were compromises: faster, cheaper, and measurably weaker than their Pro siblings. Gemini 3.5 Flash breaks that contract. According to Google's published benchmarks — signed by DeepMind CTO Koray Kavukcuoglu, Jeff Dean, Oriol Vinyals, and Noam Shazeer — Flash scores 76.2 percent on Terminal-Bench 2.1, the industry-standard benchmark for agentic coding. That is six points above Gemini 3.1 Pro's 70.3 percent. On MCP Atlas, which measures tool-use coordination across the Model Context Protocol, Flash hits 83.6 percent. It runs at roughly 280 tokens per second output speed, four times faster than comparable frontier models, and costs — according to leaked API documentation — $1.50 per million input tokens and $9.00 per million output tokens.

The architectural reason for this leap, as described in Google's technical documentation, is a combination of extreme knowledge distillation from an undisclosed "Gemini 3.5 Ultra" teacher model and a new Mixture-of-Experts architecture with 256 micro-experts, of which four are activated per inference step. Jeff Dean noted in a post-I/O analysis that the fine-tuning ratio on high-quality logical chain datasets increased by 400 percent compared to the previous generation. The result is a model that inherited the "logical brain" of a hypothetical Ultra tier without carrying its inference cost.

What makes this strategically significant is not the benchmark score alone. It is the signal Google sends by releasing Flash before Pro. In every previous Gemini generation — 1.0, 1.5, 2.0, 2.5, 3.0, 3.1 — Pro led and Flash followed. The reversal implies one of two things: either Google is confident enough in Flash's capabilities that it no longer needs Pro to establish credibility, or Pro is facing engineering challenges that require additional time. The available evidence supports the first interpretation more strongly. Pichai's exact words — "I know you can't wait to get your hands on it. Give us until next month" — suggest refinement, not crisis.

What the Flash launch reveals about Pro's likely architecture

Google has published no benchmark numbers, pricing, model card, or context window specification for Gemini 3.5 Pro. The confirmed facts fit into a single table:

Launch window: June 2026 ("next month" from I/O on May 19)
Current status: internal use at Google for training and red-teaming
Focus area: shared with Flash on coding and agentic capabilities
Everything else: unconfirmed

Yet the Flash launch is unusually informative about what Pro will not be. It will not be a marginal improvement over Flash on coding speed, because Flash already saturates that dimension. It will not compete primarily on price, because Flash owns the aggressive-cost position. What Flash regresses on — according to multiple independent analyses — is exactly where Pro will differentiate: complex reasoning at extreme context lengths, long-horizon task consistency, and deep multimodal synthesis.

The evidence for this regression comes from comparative testing between Flash and 3.1 Pro. While Flash wins on Terminal-Bench and MCP Atlas, it drops on certain long-context retrieval benchmarks and on tasks requiring extended chains of abstract reasoning. This is not a flaw; it is the predictable trade-off of a distilled, speed-optimized architecture. Pro, by inference, will restore and extend the reasoning depth that Flash sacrifices for latency. Google's history supports this: Gemini 3.1 Ultra retained a 2M-token context window and deeper reasoning capabilities even after 3.1 Pro became the developer default.

The most credible leak-derived speculation — and I emphasize that this is speculation, not confirmed fact — positions Pro with a 2 million token context window, support for Computer Use (direct desktop/OS interaction, which Flash lacks), and a "thinking mode" with per-request reasoning depth control. If accurate, this would make Pro Google's answer to Claude Opus 4.8's agentic depth and GPT-5.5's computer-use capabilities, while Flash competes on speed and cost.

Gemini Omni: the video layer Google has been missing

While the developer community focused on Flash benchmarks, Google quietly launched something potentially more consequential for mainstream adoption: Gemini Omni. Described by DeepMind CEO Demis Hassabis as "our first step towards a model that can create anything from any input," Omni is Google's first native world model with video generation and editing as its initial output modality.

The technical distinction between Omni and Google's existing Veo 3.1 video model is not subtle — and the confusion between the two has already caused planning problems for developers. Veo 3.1 is a specialized text-to-video engine: you input text, you receive cinematic video. It has been stable in production since October 2025, with established API routes, pricing tiers (Lite at $0.03/sec, Fast at $0.10/sec, Quality at $0.20-0.40/sec), and enterprise integrations through Vertex AI. Omni is fundamentally different: it accepts any combination of text, images, audio, and video as input, maintains an internal world model with physics-aware simulation (gravity, fluid dynamics, kinetic energy), and enables conversational editing where users refine outputs through multi-turn dialogue rather than rewriting prompts from scratch.

The first shipping variant, Gemini Omni Flash, generates 10-second clips with synchronized audio and rolled out immediately to the Gemini app, Google Flow, YouTube Shorts, and YouTube Create. Google's official FAQ clarifies a critical point that many headlines missed: Omni replaces Veo only inside the Gemini consumer app. Veo 3.1 remains fully supported in Vertex AI, the Gemini API, Google AI Studio, and Google Flow. For enterprise developers, this means no forced migration. For consumer creators, it means a generational leap in editability.

Character consistency across scenes is Omni's most technically impressive claimed capability. Users can define a character once — via text description, reference images, or a digital avatar created from their own likeness — and then place that character into any scene with preserved identity across lighting changes, camera angles, and environmental shifts. If this works as demonstrated, it solves the single biggest quality problem in AI video generation: the "face swap" effect where generated characters morph between cuts. Early independent testing is limited, but Google's demo footage — claymation sculptures dissolving into bubbles, sci-fi scenes with coherent physics, music-synced visuals — suggests genuine architectural progress beyond frame-by-frame generation.

The competitive landscape: Gemini 3.5 Pro vs. GPT 5.6 vs. Claude Opus 4.8

June 2026 is shaping up to be the most densely packed model-release month in AI history. Three flagships are converging on the same window:

Gemini 3.5 Pro — Google's delayed flagship, expected to restore deep reasoning and long-context leadership
GPT 5.6 — OpenAI's leaked next generation, with internal codenames (iris-alpha, ember-alpha, beacon-alpha) suggesting multi-variant release
Claude Opus 4.8 — already shipping since May 28, holding the SWE-bench Pro record at 69.2 percent

The positioning is clarifying. Anthropic currently owns the coding-quality crown with Opus 4.8. OpenAI's GPT 5.6 leaks suggest a focus on context-window expansion (1.5M tokens rumored) and UI-generation quality. Google's Gemini 3.5 Flash already dominates the speed/cost quadrant, leaving Pro to compete on reasoning depth and multimodal integration. No single vendor is leading across all dimensions simultaneously for the first time since the GPT-4 era.

For enterprise procurement, this fragmentation is actually healthy. It breaks the "default model" trap that trapped many companies in 2024-2025, when GPT-4 was the obvious choice for nearly everything. In June 2026, the correct answer to "which model should we use?" is increasingly "which model for which task?" — with routing logic that sends coding tasks to Claude, long-context research to GPT 5.6 or Gemini Pro, and high-volume agentic workflows to Gemini Flash.

What this means for the German mid-market

From our work integrating AI systems into German mid-sized companies — manufacturing SMEs, law firms, medical practices, e-commerce operations — the Gemini 3.5 family sends three concrete signals:

First: the speed/cost breakthrough is real enough to act on. Flash's $1.50/$9.00 per-million-token pricing, combined with its 4x speed advantage, changes the economics of high-volume AI applications. Customer-support agents that previously cost $0.15 per conversation on GPT-4-class models now potentially cost $0.04. Document-processing pipelines that struggled with latency can now run in real time. We are already migrating clients with high-throughput use cases to Flash, not because Pro is bad, but because Flash is good enough and dramatically cheaper.

Second: multimodal video is approaching production viability. Omni is not yet ready for enterprise video production — the 10-second clip limit, the lack of published API pricing, and the absence of scene-extension capabilities make it a consumer/creator tool for now. But the trajectory is clear. Within 12 months, AI-generated video with consistent characters, physics-aware motion, and conversational editing will be a standard marketing capability. Mid-market companies should begin experimenting now, while the technology is still differentiating, rather than waiting for it to become table stakes.

Third: Google's ecosystem integration remains its moat. Flash shipped simultaneously into the Gemini app, AI Mode in Search, Google Antigravity, Vertex AI, the Gemini API, Android Studio, and GitHub Copilot. No competitor matches this day-one surface coverage. For companies already embedded in Google Workspace, Cloud, or Android ecosystems, the friction of adopting Gemini is near zero. The strategic implication: if you are a Google shop, Flash is probably your default before you even evaluate alternatives.

What to watch in June 2026

The next three weeks will clarify everything that is currently speculative. Our watchlist:

The Pro benchmark drop. When Google publishes Pro's model card — expected within days of release — pay attention not to absolute scores but to the Flash/Pro delta. If Pro is only 5-10 percent better on coding, Flash wins on cost. If Pro leaps 20+ percent on reasoning benchmarks, the calculation changes.
Computer Use confirmation. Flash explicitly lacks this capability. If Pro ships with desktop automation comparable to GPT-5.5's OSWorld performance, it becomes the agentic-coding default for many teams.
Context window verification. Leaked claims of 2M tokens for Pro need independent testing. Flash's 1M window is already excellent; Pro must demonstrate meaningful accuracy improvements at scale, not just bigger numbers.
Omni API availability. Consumer access is live, but developer API routes, pricing, and content policies remain unpublished. Enterprise video workflows depend on this infrastructure.

Conclusion: Flash is the product, Pro is the promise, Omni is the platform bet

Gemini 3.5 Flash is already the most consequential Google AI release of 2026. It reverses a decade of "you get what you pay for" in model tiers, proving that distillation and architectural innovation can deliver flagship capabilities at commodity prices. It is not perfect — the reasoning regressions are real, the "laziness" complaints of earlier Flash generations are only "largely" resolved, and independent speed benchmarks are still pending. But it is good enough, cheap enough, and fast enough to become the default for a huge swath of production workloads.

Gemini 3.5 Pro, when it arrives, will be judged against a higher bar than any previous Google flagship — because its cheaper sibling has already set the baseline. If Pro delivers meaningfully deeper reasoning, reliable Computer Use, and sustained accuracy across 2M-token contexts, it reclaims Google's position in the frontier tier. If it merely matches Flash with marginal improvements, Google's own product strategy becomes its biggest competitor.

Gemini Omni is the longest-term bet of the three. Video generation is not yet a core business capability for most mid-market companies, but conversational multimodal editing — "create anything from any input" — points toward a future where AI is not a tool you use but an environment you work within. Google's $190 billion AI capex commitment for 2026, its 900 million monthly active Gemini users, and its 3.2 quadrillion monthly processed tokens all suggest that this platform bet is backed by resources no competitor can match.

For the German mid-market, the actionable insight is simple: start with Flash today for anything agentic, coding, or high-volume; watch Pro in June for reasoning-heavy workloads; and begin low-risk experiments with Omni video now, before the technology becomes a competitive requirement. The models will change in July. The infrastructure you build around them should not have to.

Frequently asked questions about Gemini 3.5 Pro

Is Gemini 3.5 Pro available yet?

No. As of early June 2026, Gemini 3.5 Pro is not yet publicly available. Google CEO Sundar Pichai announced at I/O 2026 on May 19 that Pro will ship "next month" — meaning June 2026. No exact date was given. The model is currently being used internally at Google.

What distinguishes Gemini 3.5 Flash from Gemini 3.5 Pro?

Flash has been generally available since May 19, 2026 and already exceeds Gemini 3.1 Pro on most coding and agentic benchmarks. Pro is positioned as a more powerful model with improved reasoning, a larger context window (possibly 2M tokens), and support for Computer Use. Flash costs $1.50 per million input tokens; Pro is expected to be significantly more expensive.

What is Gemini Omni and how does it differ from Veo?

Gemini Omni is a multimodal video model introduced at Google I/O 2026. Unlike Veo 3.1 (specialized in text-to-video), Omni accepts any combination of text, images, audio, and video as input and enables conversational editing. Omni replaces Veo in the Gemini app, but not in Vertex AI or the Gemini API.

How fast is Gemini 3.5 Flash compared to Claude Opus 4.7 and GPT-5.5?

According to Google, Gemini 3.5 Flash is roughly four times faster than comparable frontier models measured by output tokens per second. Inside Google Antigravity, the optimization is claimed to be 12x faster. Independent benchmarks confirming these speed claims were not available as of early June 2026.

Should enterprises migrate to Gemini 3.5 Flash now or wait for Pro?

For most agentic coding and tool-use workloads, switching to Flash is worthwhile immediately — it outperforms Gemini 3.1 Pro at lower cost and higher speed. Enterprises with particularly demanding reasoning tasks or long-context analyses beyond 128K tokens should wait for the Pro release in June and then make an eval-based decision.

Sources and primary references: This analysis is based on Google's official I/O 2026 announcements (blog.google/innovation-and-ai, May 19-20, 2026), the Gemini 3.5 Flash model card published by Google DeepMind (deepmind.google/models/model-cards/gemini-3.5-flash, May 19, 2026), Sundar Pichai's keynote transcript (verified via Business Insider, May 19, 2026), and independent technical reporting by Codersera, WaveSpeed AI, Build Fast with AI, LLM-Stats, The Planet Tools, and Digital Applied (all May 2026). Gemini Omni details are sourced from Google's official Omni announcement (blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-omni, May 19, 2026), the Gemini Omni Flash model card (deepmind.google/models/model-cards/gemini-omni-flash), and comparative analyses by PixVerse AI, MagicShot, and Digital Applied (all May 2026). All Pro specifications not explicitly confirmed by Google on stage are labeled as speculation. Competitive context draws on Anthropic's official Opus 4.8 announcement (May 28, 2026) and GPT-5.6 leak reporting as referenced in our separate GPT 5.6 analysis.

Related insights

BOOK A CALL WITH MICHAEL MORE INSIGHTS