Qwen 3.5 35B A3B
by Alibaba · qwen-3.5 family
35B
parameters
text-generation code-generation reasoning multilingual vision tool-use math
Qwen 3.5 35B A3B is a Mixture-of-Experts model with 35B total parameters but only 3B active per token. This sparse architecture delivers the quality of a much larger model while requiring only 12 GB VRAM at Q4. An excellent choice for users who want high-quality reasoning on consumer GPUs — it fits on a 16 GB card while performing comparably to dense 27B models.
Quick Start with Ollama
ollama run 35b-a3b-q4_K_M | Creator | Alibaba |
| Parameters | 35B |
| Architecture | transformer-decoder |
| Context | 256K tokens |
| Released | Mar 15, 2026 |
| License | Apache 2.0 |
| Ollama | qwen3.5:35b-a3b |
Quantization Options
| Format | File Size | VRAM Required | Quality | Ollama Tag |
|---|---|---|---|---|
| Q4_K_M rec | 24 GB | 12 GB | | 35b-a3b-q4_K_M |
| Q8_0 | 38 GB | 20 GB | | 35b-a3b-q8_0 |
Compatible Hardware
Q4_K_M requires 12 GB VRAM