What makes Gemini 3 Flash different from Gemini 2.5 Flash?

Gemini 3 Flash is built on the newer Gemini 3 architecture rather than Gemini 2.5. The generation change brings a substantial capability lift: Gemini 3 Flash surpasses Gemini 2.5 Pro on most benchmarks, so a speed-tier model in the 3 generation now exceeds the previous generation's flagship.

Can I control how much the model thinks before answering?

Yes. You can set `thinkingLevel` (e.g., `'high'`) and `includeThoughts: true` inside `providerOptions.google` when using the AI SDK. This gives you visibility into intermediate reasoning steps.

Does Gemini 3 Flash support streaming?

Yes. Use `streamText` from the AI SDK with `model: 'google/gemini-3-flash'` for streaming responses.

Do I need a Google Cloud account to use this model on AI Gateway?

No. AI Gateway handles all provider authentication. You authenticate to AI Gateway using a Vercel API key or OIDC token and do not need to configure Google credentials separately.

How does Gemini 3 Flash compare to Gemini 3 Pro on reasoning tasks?

Gemini 3 Pro targets the most challenging reasoning and agentic workflows. Gemini 3 Flash prioritizes speed and cost while still delivering pro-grade quality. The right tradeoff depends on your latency budget and task complexity.

What is Zero Data Retention and does Gemini 3 Flash support it?

Yes, Zero Data Retention is available for this model. ZDR on AI Gateway applies to direct gateway requests; BYOK flows aren't covered. See https://vercel.com/docs/ai-gateway/capabilities/zdr for configuration details.

What token efficiency improvements does Gemini 3 Flash offer?

Gemini 3 Flash uses 30% fewer tokens than previous Gemini 2.5 models. Combined with lower per-token pricing, this results in meaningful cost reductions at scale for applications processing large volumes of requests.

Is Gemini 3 Flash suitable for agentic multi-step workflows?

Yes. The model's combination of reasoning capability, token efficiency, and low latency makes it well-suited for agents that execute multiple tool calls or reasoning steps in sequence within a budget-constrained environment.

Gemini 3 Flash

Gemini 3 Flash delivers Gemini 3's pro-grade reasoning at flash-level latency and cost, using 30% fewer tokens than previous Gemini 2.5 models while outperforming them across most benchmarks.

ReasoningTool UseFile InputVision (Image)Web Searchtiered-costImplicit Caching

import { streamText } from 'ai'

const result = streamText({
  model: 'google/gemini-3-flash',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Playground

Try out Gemini 3 Flash by Google. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

About Gemini 3 Flash

Gemini 3 Flash is Google's speed-optimized model in the Gemini 3 generation, combining Gemini 3's reasoning depth with the efficiency profile of the Flash tier. It significantly outperforms Gemini 2.5 Pro across most benchmarks, meaning a speed-tier model now surpasses a previous-generation flagship. It achieves this while consuming 30% fewer tokens and running at 3x the speed of its predecessors.

Thinking is first-class in Gemini 3 Flash. The thinkingLevel and includeThoughts provider options let you surface intermediate reasoning steps. This helps when debugging multi-step pipelines, constructing chain-of-thought datasets, or validating that the model reasons through a problem correctly. Set thinkingLevel to high when the task demands deeper inference and your latency budget allows it.

Because Gemini 3 Flash sits at the intersection of quality and throughput, it fits a wide range of real-world traffic patterns, from low-latency chat interfaces to batch document processing pipelines. Accessing it through AI Gateway adds observability, automatic retries, and provider failover without requiring a Google Cloud account.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Legal:Terms

•

Privacy

0.9s

172tps

$0.50/M

$3.00/M

Read:

$0.05/M

Write:

—

$14.00/K

+ input costs

—

12/17/2025

Legal:Terms

•

Privacy

0.5s

165tps

$0.50/M

$3.00/M

Read:

$0.05/M

Write:

—

$14.00/K

+ input costs

—

12/17/2025

More models by Google

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

262K

0.3s

86tps

$0.13/M

$0.40/M

—

04/02/2026

0.9s

261tps

$0.25/M

$1.50/M

Read:$0.03/M

Write:—

$14.00/K

+ input costs

—

03/03/2026

3.6s

226tps

$2.00/M

$12.00/M

Read:

$0.2/M

Write:

—

$14.00/K

+ input costs

—

02/19/2026

0.5s

299tps

$0.10/M

$0.40/M

Read:$0.01/M

Write:—

$35.00/K

+ input costs

—

06/17/2025

0.4s

196tps

$0.30/M

$2.50/M

Read:$0.03/M

Write:—

$35.00/K

+ input costs

—

03/20/2025

2.1s

136tps

$1.25/M

$10.00/M

Read:

$0.13/M

Write:

—

$35.00/K

+ input costs

—

03/20/2025

What To Consider When Choosing a Provider

Configuration: Gemini 3 Flash supports configurable thinking levels (high included) via providerOptions, giving you direct control over how much reasoning compute the model applies per request.
Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Gemini 3 Flash

Best For

Real-time chat and assistants: Interfaces that require pro-level reasoning without high latency
High-volume agentic pipelines: Per-token cost directly affects operating expenses
Step-by-step analysis: Tasks where surfacing intermediate reasoning (includeThoughts) adds value
Throughput-bottlenecked apps: Applications previously constrained by Gemini 2.5 Pro throughput limits
Cost-sensitive production workloads: Production traffic where per-token cost matters but quality still has to stay benchmark-competitive

Consider Alternatives When

Maximum reasoning depth: Your task requires the deepest reasoning regardless of cost or speed (consider google/gemini-3-pro-preview or google/gemini-3.1-pro-preview)
Native image generation needed: You require image output alongside text (consider google/gemini-3-pro-image or google/gemini-3.1-flash-image-preview)
Budget and latency dominate: Task quality requirements are low (consider google/gemini-3.1-flash-lite-preview)

Conclusion

Gemini 3 Flash resets expectations for what a speed-tier model can deliver, matching or exceeding previous-generation Pro quality at a fraction of the cost and latency. For teams that need scalable intelligence rather than raw capability, it represents a cost- and latency-efficient entry point into the Gemini 3 generation on AI Gateway.

Frequently Asked Questions

What makes Gemini 3 Flash different from Gemini 2.5 Flash?
Gemini 3 Flash is built on the newer Gemini 3 architecture rather than Gemini 2.5. The generation change brings a substantial capability lift: Gemini 3 Flash surpasses Gemini 2.5 Pro on most benchmarks, so a speed-tier model in the 3 generation now exceeds the previous generation's flagship.
Can I control how much the model thinks before answering?
Yes. You can set thinkingLevel (e.g., 'high') and includeThoughts: true inside providerOptions.google when using the AI SDK. This gives you visibility into intermediate reasoning steps.
Does Gemini 3 Flash support streaming?
Yes. Use streamText from the AI SDK with model: 'google/gemini-3-flash' for streaming responses.
Do I need a Google Cloud account to use this model on AI Gateway?
No. AI Gateway handles all provider authentication. You authenticate to AI Gateway using a Vercel API key or OIDC token and do not need to configure Google credentials separately.
How does Gemini 3 Flash compare to Gemini 3 Pro on reasoning tasks?
Gemini 3 Pro targets the most challenging reasoning and agentic workflows. Gemini 3 Flash prioritizes speed and cost while still delivering pro-grade quality. The right tradeoff depends on your latency budget and task complexity.
What is Zero Data Retention and does Gemini 3 Flash support it?
Yes, Zero Data Retention is available for this model. ZDR on AI Gateway applies to direct gateway requests; BYOK flows aren't covered. See https://vercel.com/docs/ai-gateway/capabilities/zdr for configuration details.
What token efficiency improvements does Gemini 3 Flash offer?
Gemini 3 Flash uses 30% fewer tokens than previous Gemini 2.5 models. Combined with lower per-token pricing, this results in meaningful cost reductions at scale for applications processing large volumes of requests.
Is Gemini 3 Flash suitable for agentic multi-step workflows?
Yes. The model's combination of reasoning capability, token efficiency, and low latency makes it well-suited for agents that execute multiple tool calls or reasoning steps in sequence within a budget-constrained environment.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Gemini 3 Flash

Playground

About Gemini 3 Flash

Providers

More models by Google

What To Consider When Choosing a Provider

When to Use Gemini 3 Flash

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions

Playground

About Gemini 3 Flash

Providers

More models by Google