MacBook Pro M5 Max 36GB
Apple · M5 Max · 36GB Unified Memory · Can run 57 models
| Manufacturer | Apple |
| Unified Mem | 36 GB |
| Chip | M5 Max |
| CPU Cores | 16 |
| GPU Cores | 32 |
| Neural Engine | 16 |
| Bandwidth | 410 GB/s |
| MSRP | $3,199 |
| Released | Mar 11, 2026 |
AI Notes
The MacBook Pro M5 Max 36GB is a powerhouse for local AI inference. With 410 GB/s memory bandwidth and 32 GPU cores, it delivers exceptionally fast token generation. The 36GB of unified memory fits 30B models comfortably and can handle 70B models with aggressive quantization.
Compatible Models
| Model | Parameters | Best Quant | VRAM Used | Fit | Est. Speed |
|---|---|---|---|---|---|
| Qwen 3 0.6B | 600M | Q4_K_M | 2.5 GB | Runs | ~164 tok/s |
| Gemma 3 1B | 1B | Q8_0 | 2 GB | Runs | ~205 tok/s |
| Llama 3.2 1B | 1B | Q8_0 | 3 GB | Runs | ~137 tok/s |
| DeepSeek R1 1.5B | 1.5B | Q8_0 | 3 GB | Runs | ~137 tok/s |
| Gemma 2 2B | 2B | Q8_0 | 4 GB | Runs | ~103 tok/s |
| Gemma 3n E2B | 2B | Q4_K_M | 3.3 GB | Runs | ~124 tok/s |
| Llama 3.2 3B | 3B | Q8_0 | 5 GB | Runs | ~82 tok/s |
| Phi-3 Mini 3.8B | 3.8B | Q8_0 | 5.8 GB | Runs | ~71 tok/s |
| Phi-4 Mini 3.8B | 3.8B | Q4_K_M | 4.5 GB | Runs | ~91 tok/s |
| Gemma 3 4B | 4B | Q4_K_M | 5 GB | Runs | ~82 tok/s |
| Gemma 3n E4B | 4B | Q4_K_M | 4.5 GB | Runs | ~91 tok/s |
| Qwen 3 4B | 4B | Q4_K_M | 4.5 GB | Runs | ~91 tok/s |
| DeepSeek R1 7B | 7B | Q8_0 | 9 GB | Runs | ~46 tok/s |
| Falcon 3 7B | 7B | Q4_K_M | 6.8 GB | Runs | ~60 tok/s |
| Mistral 7B | 7B | Q8_0 | 9 GB | Runs | ~46 tok/s |
| Qwen 2.5 7B | 7B | Q8_0 | 9 GB | Runs | ~46 tok/s |
| Qwen 2.5 Coder 7B | 7B | Q8_0 | 9 GB | Runs | ~46 tok/s |
| Qwen 2.5 VL 7B | 7B | Q4_K_M | 7 GB | Runs | ~59 tok/s |
| Cogito 8B | 8B | Q4_K_M | 7.5 GB | Runs | ~55 tok/s |
| DeepSeek R1 8B | 8B | Q4_K_M | 7.5 GB | Runs | ~55 tok/s |
| Llama 3.1 8B | 8B | Q8_0 | 10 GB | Runs | ~41 tok/s |
| Nemotron 3 Nano 8B | 8B | Q4_K_M | 7.5 GB | Runs | ~55 tok/s |
| Qwen 3 8B | 8B | Q4_K_M | 7.5 GB | Runs | ~55 tok/s |
| Gemma 2 9B | 9B | Q8_0 | 11 GB | Runs | ~37 tok/s |
| Falcon 3 10B | 10B | Q4_K_M | 8.5 GB | Runs | ~48 tok/s |
| Llama 3.2 Vision 11B | 11B | Q4_K_M | 8.5 GB | Runs | ~48 tok/s |
| Gemma 3 12B | 12B | Q4_K_M | 10.5 GB | Runs | ~39 tok/s |
| Mistral Nemo 12B | 12B | Q4_K_M | 9.5 GB | Runs | ~43 tok/s |
| DeepSeek R1 14B | 14B | Q4_K_M | 9.9 GB | Runs | ~41 tok/s |
| Phi-4 14B | 14B | Q4_K_M | 9.9 GB | Runs | ~41 tok/s |
| Phi-4 Reasoning 14B | 14B | Q4_K_M | 11 GB | Runs | ~37 tok/s |
| Qwen 2.5 14B | 14B | Q4_K_M | 9.9 GB | Runs | ~41 tok/s |
| Qwen 2.5 Coder 14B | 14B | Q4_K_M | 12 GB | Runs | ~34 tok/s |
| Qwen 3 14B | 14B | Q4_K_M | 12 GB | Runs | ~34 tok/s |
| StarCoder2 15B | 15B | Q8_0 | 17 GB | Runs | ~24 tok/s |
| Codestral 22B | 22B | Q4_K_M | 14.7 GB | Runs | ~28 tok/s |
| Devstral 24B | 24B | Q4_K_M | 17 GB | Runs | ~24 tok/s |
| Magistral Small 24B | 24B | Q4_K_M | 17 GB | Runs | ~24 tok/s |
| Mistral Small 3.1 24B | 24B | Q4_K_M | 18 GB | Runs | ~23 tok/s |
| Gemma 2 27B | 27B | Q4_K_M | 17.7 GB | Runs | ~23 tok/s |
| Gemma 3 27B | 27B | Q4_K_M | 20 GB | Runs | ~21 tok/s |
| Qwen 3 30B-A3B (MoE) | 30B | Q4_K_M | 22 GB | Runs | ~19 tok/s |
| Cogito 32B | 32B | Q4_K_M | 21.5 GB | Runs | ~19 tok/s |
| DeepSeek R1 32B | 32B | Q4_K_M | 20.7 GB | Runs | ~20 tok/s |
| Qwen 2.5 32B | 32B | Q4_K_M | 20.7 GB | Runs | ~20 tok/s |
| Qwen 2.5 Coder 32B | 32B | Q4_K_M | 23 GB | Runs | ~18 tok/s |
| Qwen 3 32B | 32B | Q4_K_M | 23 GB | Runs | ~18 tok/s |
| QwQ 32B | 32B | Q4_K_M | 21.5 GB | Runs | ~19 tok/s |
| Command R 35B | 35B | Q4_K_M | 22.5 GB | Runs | ~18 tok/s |
| Mixtral 8x7B | 47B | Q4_K_M | 29.7 GB | Runs | ~14 tok/s |
| Cogito 70B | 70B | Q4_K_M | 43 GB | CPU Offload | ~3 tok/s |
| DeepSeek R1 70B | 70B | Q4_K_M | 43.5 GB | CPU Offload | ~3 tok/s |
| Llama 3.1 70B | 70B | Q4_K_M | 43.5 GB | CPU Offload | ~3 tok/s |
| Llama 3.3 70B | 70B | Q4_K_M | 43.5 GB | CPU Offload | ~3 tok/s |
| Qwen 2.5 72B | 72B | Q4_K_M | 44.7 GB | CPU Offload | ~3 tok/s |
| Qwen 2.5 VL 72B | 72B | Q4_K_M | 41 GB | CPU Offload | ~3 tok/s |
| Llama 3.2 Vision 90B | 90B | Q4_K_M | 50 GB | CPU Offload | ~2 tok/s |
12
model(s) are too large for this hardware.