What is the difference between GLM 4.5 and GLM-4.5-Air?

GLM 4.5 is the full-scale model optimized for maximum capability across reasoning, coding, and agentic tasks. GLM-4.5-Air is a lighter variant designed for lower latency and cost on less demanding workloads.

Does GLM 4.5 support configurable thinking?

Yes. You can enable or disable chain-of-thought reasoning per request. Thinking mode improves accuracy on complex tasks but increases output length and latency.

How much does GLM 4.5 cost through AI Gateway?

Pricing appears on this page and updates as providers adjust their rates. AI Gateway routes traffic through the configured provider.

How do I authenticate with GLM 4.5 through AI Gateway?

AI Gateway provides a unified API key. You don't need a separate Z.ai account. Configure your API key in your environment, then use the model identifier to route requests. BYOK is also supported if you have a direct provider account.

Can I use GLM 4.5 for agentic applications with tool use?

Yes. GLM 4.5 supports agentic workflows with multi-step planning and tool use. The configurable thinking mode lets you control reasoning depth per step in your pipeline.

What providers serve GLM 4.5 through AI Gateway?

GLM 4.5 is available through zai, novita. AI Gateway handles intelligent routing and automatic retries across configured providers.

GLM 4.5

GLM 4.5 is Z.ai's full-scale model released July 28, 2025, unifying reasoning, coding, and agentic capabilities in a single endpoint. Available through AI Gateway with built-in observability and intelligent provider routing.

ReasoningTool UseImplicit Caching

import { streamText } from 'ai'

const result = streamText({
  model: 'zai/glm-4.5',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Playground

Try out GLM 4.5 by Z.ai. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

About GLM 4.5

GLM 4.5 was released July 28, 2025 as Z.ai's full-scale large language model designed to unify reasoning, coding, and agentic capabilities. It represents the full-scale offering in the GLM-4.5 generation, targeting workloads where broad competence across analytical and generative tasks matters more than narrow specialization.

The model supports configurable thinking, letting you enable or disable chain-of-thought reasoning depending on the task. This flexibility is useful in agentic pipelines where some steps benefit from deliberation and others need fast, direct responses. GLM 4.5 operates within a context window of 131.1K tokens, handling long documents, extended conversations, and multi-file code analysis in a single pass.

Z.ai positions GLM 4.5 alongside other widely used closed-source models. For teams evaluating alternatives across providers, it offers a distinct cost-performance point. Through AI Gateway, you access GLM 4.5 with a unified API, automatic retries, and provider routing without managing separate accounts.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Legal:Terms

•

Privacy

128K

0.8s

63tps

$0.60/M

$2.20/M

Read:$0.11/M

Write:—

—

07/28/2025

Legal:Terms

•

Privacy

131K

0.9s

51tps

$0.60/M

$2.20/M

Read:$0.11/M

Write:—

—

07/28/2025

More models by Z.ai

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

205K

0.8s

50tps

$1.40/M

$4.40/M

Read:$0.26/M

Write:—

—

04/07/2026

203K

1.0s

39tps

$1.20/M

$4.00/M

Read:$0.24/M

Write:—

—

03/15/2026

203K

0.4s

85tps

$0.80/M

$2.56/M

Read:$0.16/M

Write:—

—

02/12/2026

205K

0.1s

587tps

$2.25/M

$2.75/M

Read:$2.25/M

Write:—

—

12/22/2025

205K

0.4s

42tps

$0.60/M

$2.20/M

Read:$0.11/M

Write:—

—

09/30/2025

200K

0.2s

93tps

$0.07/M

$0.40/M

Read:$0.01/M

Write:—

—

What To Consider When Choosing a Provider

Configuration: GLM 4.5 supports a context window of 131.1K tokens and up to 131.1K tokens per request. For reasoning-heavy tasks with thinking enabled, budget extra output tokens for chain-of-thought traces that precede the final answer.
Configuration: Test both thinking-enabled and thinking-disabled modes. Thinking mode improves accuracy on complex reasoning but increases latency and token usage. Disable it for straightforward generation tasks.
Configuration: When using AI Gateway, configure fallback providers to maintain availability. GLM 4.5 is available through zai, novita.
Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use GLM 4.5

Best For

General-purpose reasoning and coding: Unified capability across math, logic, and code generation reduces the need for task-specific models
Agentic workflows: Multi-step planning, tool use, and configurable thinking run within a single model
Long-document analysis: The context window of 131.1K tokens fits contracts, research papers, or large codebases
Production deployments: Built-in observability and automatic retries through AI Gateway reduce operational overhead
Cost-conscious teams: Compare listed rates against alternative providers when evaluating total spend

Consider Alternatives When

Lightweight high-volume alternative: GLM-4.5-Air offers reduced latency and cost for less demanding workloads
Vision or multimodal input: GLM-4.5V adds image understanding on top of the GLM-4.5 foundation
Code generation focus: GLM-4.6 and later models include targeted coding improvements
Deeper reasoning and planning: GLM-5 introduces multiple thinking modes and improved long-range planning

Conclusion

GLM 4.5 is Z.ai's full-capability model in the GLM-4.5 generation, balancing reasoning depth, coding proficiency, and agentic flexibility. For teams that need a single model covering a broad range of tasks with configurable thinking, it's available through AI Gateway with unified billing and observability.

Frequently Asked Questions

What is the difference between GLM 4.5 and GLM-4.5-Air?
GLM 4.5 is the full-scale model optimized for maximum capability across reasoning, coding, and agentic tasks. GLM-4.5-Air is a lighter variant designed for lower latency and cost on less demanding workloads.
Does GLM 4.5 support configurable thinking?
Yes. You can enable or disable chain-of-thought reasoning per request. Thinking mode improves accuracy on complex tasks but increases output length and latency.
What is the context window for GLM 4.5?
131.1K tokens, supporting long documents, extended conversations, and multi-file code analysis in a single request.
How much does GLM 4.5 cost through AI Gateway?
Pricing appears on this page and updates as providers adjust their rates. AI Gateway routes traffic through the configured provider.
How do I authenticate with GLM 4.5 through AI Gateway?
AI Gateway provides a unified API key. You don't need a separate Z.ai account. Configure your API key in your environment, then use the model identifier to route requests. BYOK is also supported if you have a direct provider account.
Can I use GLM 4.5 for agentic applications with tool use?
Yes. GLM 4.5 supports agentic workflows with multi-step planning and tool use. The configurable thinking mode lets you control reasoning depth per step in your pipeline.
What providers serve GLM 4.5 through AI Gateway?
GLM 4.5 is available through zai, novita. AI Gateway handles intelligent routing and automatic retries across configured providers.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

GLM 4.5

Playground

About GLM 4.5

Providers

More models by Z.ai

What To Consider When Choosing a Provider

When to Use GLM 4.5

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions

Playground

About GLM 4.5

Providers

More models by Z.ai