Qwen 3.5 35B A3B

Name: Qwen 3.5 35B A3B
Author: Alibaba

Apache 2.0

Alibaba · 35B · transformer-decoder

🤗 HuggingFace Ollama Official

2026-03-15 262K context 35B params

Use Cases

chat code reasoning multilingual vision tools math

Quantization Options

Quant	Bits	VRAM	Quality	Status
Q4_K_Mrec	4	12.0 GB	Good	—
Q8_0	8	20.0 GB	Excellent	—

About this model

Qwen 3.5 35B A3B is a Mixture-of-Experts model with 35B total parameters but only 3B active per token. This sparse architecture delivers the quality of a much larger model while requiring only 12 GB VRAM at Q4. An excellent choice for users who want high-quality reasoning on consumer GPUs — it fits on a 16 GB card while performing comparably to dense 27B models.

Your Hardware

DevicePick…

VRAM—

Bandwidth—

Detecting…

Install

Ollama

ollama run qwen3.5:35b-a3b-q4_K_M

llama.cpp / GGUF

Download GGUF from HuggingFace

Specs

Parameters: 35B
Architecture: transformer-decoder
Context: 262K tokens
Min VRAM: 12.0 GB
Recommended: 12.0 GB
Family: Qwen 3.5
Released: 2026-03-15
License: Apache 2.0