What does "hybrid inference" mean for DeepSeek V3.1?

The same model weights support both a thinking mode (extended chain-of-thought) and a non-thinking mode (direct completion). Select the mode by calling `deepseek-reasoner` for thinking or `deepseek-chat` for non-thinking. No separate model switch is needed.

Is DeepSeek V3.1's thinking mode faster than DeepSeek-R1?

Yes. DeepSeek-V3.1-Think reaches answers in less time than DeepSeek-R1-0528 on equivalent tasks.

Does DeepSeek V3.1 support the Anthropic API format?

Yes. Existing Anthropic-format integrations can route to DeepSeek V3.1 without additional conversion.

What is strict function calling and is it available in DeepSeek V3.1?

It's in beta for DeepSeek V3.1. Strict function calling requires tool call arguments to match the provided JSON schema exactly.

What is the context window for DeepSeek V3.1?

163.8K tokens for both thinking and non-thinking modes.

DeepSeek V3.1

DeepSeek V3.1 is DeepSeek's August 21, 2025 model update introducing hybrid inference with selectable thinking and non-thinking modes in one endpoint. It strengthens tool use and multi-step agent capabilities over DeepSeek-V3.

ReasoningTool Use

import { streamText } from 'ai'

const result = streamText({
  model: 'deepseek/deepseek-v3.1',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Playground

Try out DeepSeek V3.1 by DeepSeek. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

About DeepSeek V3.1

DeepSeek V3.1 was released August 21, 2025. Its central change consolidates thinking and non-thinking inference into one model. Access non-thinking mode via the deepseek-chat API identifier and thinking mode via deepseek-reasoner. Previously these required separate deployments. The dual-mode design lets you route requests to different inference behaviors without maintaining separate integrations, simplifying agent architectures where some steps need reasoning and others don't.

The thinking mode offers improved efficiency over prior reasoning models. Strict function calling is available in beta, alongside Anthropic API format compatibility, expanding the range of infrastructure that can route to DeepSeek V3.1 without modification.

DeepSeek V3.1 targets stronger multi-step reasoning for complex search tasks, better performance on SWE-Bench and Terminal-Bench, and a new tokenizer with a refreshed chat template. Current AI Gateway rates appear on this page.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Legal:Terms

•

Privacy

164K

1.8s

7tps

$0.21/M

$0.79/M

Read:$0.13/M

Write:—

—

08/21/2025

Legal:Terms

•

Privacy

164K

1.0s

32tps

$0.27/M

$1.00/M

Read:$0.14/M

Write:—

—

08/21/2025

Legal:Terms

•

Privacy

164K

0.3s

141tps

$0.50/M

$1.50/M

—

08/21/2025

Legal:Terms

•

Privacy

164K

0.3s

66tps

$0.56/M

$1.68/M

Read:$0.28/M

Write:—

—

08/21/2025

Legal:Terms

•

Privacy

128K

0.4s

41tps

$0.60/M

$1.70/M

—

08/21/2025

Legal:Terms

•

Privacy

128K

0.4s

270tps

$3.00/M

$4.50/M

—

08/21/2025

More models by DeepSeek

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

1.0s

103tps

$0.43/M

$0.87/M

Read:$0.0/M

Write:—

—

04/23/2026

1.5s

74tps

$0.14/M

$0.28/M

Read:$0.0/M

Write:—

—

04/23/2026

164K

0.5s

71tps

$0.28/M

$0.42/M

Read:$0.03/M

Write:—

—

12/01/2025

164K

0.3s

91tps

$0.28/M

$0.42/M

Read:$0.03/M

Write:—

—

12/01/2025

131K

1.6s

24tps

$0.27/M

$1.00/M

Read:$0.14/M

Write:—

—

09/22/2025

164K

0.3s

152tps

$0.77/M

Read:$0.14/M

Write:—

—

12/26/2024

What To Consider When Choosing a Provider

Configuration: Two usage modes share the same model. Test both thinking and non-thinking paths in your integration to confirm your application correctly interprets response structure under each mode.
Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use DeepSeek V3.1

Best For

Mixed agent pipelines: Combine reasoning-heavy steps (tool planning, code generation) with fast-response steps (parsing, classification) through a single endpoint
Software engineering automation: SWE-Bench and Terminal-Bench improvements translate to better code generation and execution performance
Anthropic API compatibility: Existing Anthropic-format integrations route to DeepSeek V3.1 with minimal integration change
Complex multi-step search: The thinking mode's improved efficiency reduces total response latency for multi-step workflows
Upgrading from DeepSeek-V3: Backward-compatible API routing plus optional thinking mode

Consider Alternatives When

Pure reasoning workloads: DeepSeek-R1 remains the dedicated reasoning specialist
Multilingual stability critical: DeepSeek-V3.1 Terminus addresses reliability issues for Chinese-English code-switching output consistency
Straightforward chat or completion: DeepSeek-V3 may be more cost-efficient for high-volume workloads without hybrid inference needs

Conclusion

DeepSeek V3.1 consolidates thinking and non-thinking modes into a single endpoint, simplifying deployment for reasoning-capable systems. It adds capability over DeepSeek-V3 for agentic and software engineering tasks.

Frequently Asked Questions

What does "hybrid inference" mean for DeepSeek V3.1?
The same model weights support both a thinking mode (extended chain-of-thought) and a non-thinking mode (direct completion). Select the mode by calling deepseek-reasoner for thinking or deepseek-chat for non-thinking. No separate model switch is needed.
Is DeepSeek V3.1's thinking mode faster than DeepSeek-R1?
Yes. DeepSeek-V3.1-Think reaches answers in less time than DeepSeek-R1-0528 on equivalent tasks.
Does DeepSeek V3.1 support the Anthropic API format?
Yes. Existing Anthropic-format integrations can route to DeepSeek V3.1 without additional conversion.
What is strict function calling and is it available in DeepSeek V3.1?
It's in beta for DeepSeek V3.1. Strict function calling requires tool call arguments to match the provided JSON schema exactly.
What is the context window for DeepSeek V3.1?
163.8K tokens for both thinking and non-thinking modes.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

DeepSeek V3.1

Playground

About DeepSeek V3.1

Providers

More models by DeepSeek

What To Consider When Choosing a Provider

When to Use DeepSeek V3.1

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions

Playground

About DeepSeek V3.1

Providers

More models by DeepSeek