Qwen 3.5 Flash

Qwen 3.5 Flash is Alibaba's production-hosted multimodal model built on a hybrid linear-attention MoE architecture, offering a context window of 1M tokens and sub-second responsiveness for high-throughput agentic workloads.

Vision (Image)File InputReasoningTool UseImplicit Caching

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'alibaba/qwen3.5-flash',
  prompt: 'Why is the sky blue?'
})

Overview About Providers Throughput Latency Uptime Status Similar FAQ

More models by Alibaba

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

alibaba/qwen3.7-plus

1.0s

375tps

$0.32/M

$1.28/M

Read:$0.08/M

Write:$0.5/M

—

06/02/2026

alibaba/qwen3.7-max

991K

2.7s

55tps

$1.25/M

$3.75/M

Read:$0.25/M

Write:$1.56/M

—

05/21/2026

alibaba/qwen3.6-plus

0.2s

109tps

$0.50/M

$3.00/M

Read:

$0.1/M

Write:

$0.63/M

—

04/02/2026

alibaba/qwen3.5-plus

1.2s

110tps

$0.40/M

$2.40/M

Read:

$0.04/M

Write:

$0.5/M

—

02/16/2026

alibaba/qwen3-embedding-8b

33K

$0.05/M

—

06/05/2025

alibaba/qwen3-embedding-4b

33K

$0.02/M

—

06/05/2025

Agent Stack

Core Platform

Tools

Learn

Build

Explore

Qwen 3.5 Flash

More models by Alibaba