Gemini 3 Flash
Gemini 3 Flash delivers Gemini 3's pro-grade reasoning at flash-level latency and cost, using 30% fewer tokens than previous Gemini 2.5 models while outperforming them across most benchmarks.
import { streamText } from 'ai'
const result = streamText({ model: 'google/gemini-3-flash', prompt: 'Why is the sky blue?'})What To Consider When Choosing a Provider
- Configuration: Gemini 3 Flash supports configurable thinking levels (
highincluded) viaproviderOptions, giving you direct control over how much reasoning compute the model applies per request. - Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use Gemini 3 Flash
Best For
- Real-time chat and assistants: Interfaces that require pro-level reasoning without high latency
- High-volume agentic pipelines: Per-token cost directly affects operating expenses
- Step-by-step analysis: Tasks where surfacing intermediate reasoning (
includeThoughts) adds value - Throughput-bottlenecked apps: Applications previously constrained by Gemini 2.5 Pro throughput limits
- Cost-sensitive production workloads: Production traffic where per-token cost matters but quality still has to stay benchmark-competitive
Consider Alternatives When
- Maximum reasoning depth: Your task requires the deepest reasoning regardless of cost or speed (consider
google/gemini-3-pro-previeworgoogle/gemini-3.1-pro-preview) - Native image generation needed: You require image output alongside text (consider
google/gemini-3-pro-imageorgoogle/gemini-3.1-flash-image-preview) - Budget and latency dominate: Task quality requirements are low (consider
google/gemini-3.1-flash-lite-preview)
Conclusion
Gemini 3 Flash resets expectations for what a speed-tier model can deliver, matching or exceeding previous-generation Pro quality at a fraction of the cost and latency. For teams that need scalable intelligence rather than raw capability, it represents a cost- and latency-efficient entry point into the Gemini 3 generation on AI Gateway.
Frequently Asked Questions
What makes Gemini 3 Flash different from Gemini 2.5 Flash?
Gemini 3 Flash is built on the newer Gemini 3 architecture rather than Gemini 2.5. The generation change brings a substantial capability lift: Gemini 3 Flash surpasses Gemini 2.5 Pro on most benchmarks, so a speed-tier model in the 3 generation now exceeds the previous generation's flagship.
Can I control how much the model thinks before answering?
Yes. You can set
thinkingLevel(e.g.,'high') andincludeThoughts: trueinsideproviderOptions.googlewhen using the AI SDK. This gives you visibility into intermediate reasoning steps.Does Gemini 3 Flash support streaming?
Yes. Use
streamTextfrom the AI SDK withmodel: 'google/gemini-3-flash'for streaming responses.Do I need a Google Cloud account to use this model on AI Gateway?
No. AI Gateway handles all provider authentication. You authenticate to AI Gateway using a Vercel API key or OIDC token and do not need to configure Google credentials separately.
How does Gemini 3 Flash compare to Gemini 3 Pro on reasoning tasks?
Gemini 3 Pro targets the most challenging reasoning and agentic workflows. Gemini 3 Flash prioritizes speed and cost while still delivering pro-grade quality. The right tradeoff depends on your latency budget and task complexity.
What is Zero Data Retention and does Gemini 3 Flash support it?
Yes, Zero Data Retention is available for this model. ZDR on AI Gateway applies to direct gateway requests; BYOK flows aren't covered. See https://vercel.com/docs/ai-gateway/capabilities/zdr for configuration details.
What token efficiency improvements does Gemini 3 Flash offer?
Gemini 3 Flash uses 30% fewer tokens than previous Gemini 2.5 models. Combined with lower per-token pricing, this results in meaningful cost reductions at scale for applications processing large volumes of requests.
Is Gemini 3 Flash suitable for agentic multi-step workflows?
Yes. The model's combination of reasoning capability, token efficiency, and low latency makes it well-suited for agents that execute multiple tool calls or reasoning steps in sequence within a budget-constrained environment.