MacBook Pro M4 Max 128GB
Apple · M4 Max · 128GB Unified Memory · Can run 64 models
| Manufacturer | Apple |
| Unified Mem | 128 GB |
| Chip | M4 Max |
| CPU Cores | 16 |
| GPU Cores | 40 |
| Neural Engine | 16 |
| Bandwidth | 546 GB/s |
| MSRP | $4,999 |
| Released | Nov 8, 2024 |
AI Notes
The MacBook Pro M4 Max 128GB is the ultimate laptop for local AI. With 128GB of unified memory, it can run 70B models at full precision and even load larger models like Llama 3.1 405B with heavy quantization. The 546 GB/s memory bandwidth ensures excellent inference throughput for the most demanding AI workloads on the go.
Compatible Models
| Model | Parameters | Best Quant | VRAM Used | Fit | Est. Speed |
|---|---|---|---|---|---|
| Qwen 3 0.6B | 600M | Q4_K_M | 2.5 GB | Runs | ~218 tok/s |
| Gemma 3 1B | 1B | Q8_0 | 2 GB | Runs | ~273 tok/s |
| Llama 3.2 1B | 1B | Q8_0 | 3 GB | Runs | ~182 tok/s |
| DeepSeek R1 1.5B | 1.5B | Q8_0 | 3 GB | Runs | ~182 tok/s |
| Gemma 2 2B | 2B | Q8_0 | 4 GB | Runs | ~137 tok/s |
| Gemma 3n E2B | 2B | Q4_K_M | 3.3 GB | Runs | ~165 tok/s |
| Llama 3.2 3B | 3B | Q8_0 | 5 GB | Runs | ~109 tok/s |
| Phi-3 Mini 3.8B | 3.8B | Q8_0 | 5.8 GB | Runs | ~94 tok/s |
| Phi-4 Mini 3.8B | 3.8B | Q4_K_M | 4.5 GB | Runs | ~121 tok/s |
| Gemma 3 4B | 4B | Q4_K_M | 5 GB | Runs | ~109 tok/s |
| Gemma 3n E4B | 4B | Q4_K_M | 4.5 GB | Runs | ~121 tok/s |
| Qwen 3 4B | 4B | Q4_K_M | 4.5 GB | Runs | ~121 tok/s |
| DeepSeek R1 7B | 7B | Q8_0 | 9 GB | Runs | ~61 tok/s |
| Falcon 3 7B | 7B | Q4_K_M | 6.8 GB | Runs | ~80 tok/s |
| Mistral 7B | 7B | Q8_0 | 9 GB | Runs | ~61 tok/s |
| Qwen 2.5 7B | 7B | Q8_0 | 9 GB | Runs | ~61 tok/s |
| Qwen 2.5 Coder 7B | 7B | Q8_0 | 9 GB | Runs | ~61 tok/s |
| Qwen 2.5 VL 7B | 7B | Q4_K_M | 7 GB | Runs | ~78 tok/s |
| Cogito 8B | 8B | Q4_K_M | 7.5 GB | Runs | ~73 tok/s |
| DeepSeek R1 8B | 8B | Q4_K_M | 7.5 GB | Runs | ~73 tok/s |
| Llama 3.1 8B | 8B | Q8_0 | 10 GB | Runs | ~55 tok/s |
| Nemotron 3 Nano 8B | 8B | Q4_K_M | 7.5 GB | Runs | ~73 tok/s |
| Qwen 3 8B | 8B | Q4_K_M | 7.5 GB | Runs | ~73 tok/s |
| Gemma 2 9B | 9B | Q8_0 | 11 GB | Runs | ~50 tok/s |
| Falcon 3 10B | 10B | Q4_K_M | 8.5 GB | Runs | ~64 tok/s |
| Llama 3.2 Vision 11B | 11B | Q4_K_M | 8.5 GB | Runs | ~64 tok/s |
| Gemma 3 12B | 12B | Q4_K_M | 10.5 GB | Runs | ~52 tok/s |
| Mistral Nemo 12B | 12B | Q4_K_M | 9.5 GB | Runs | ~57 tok/s |
| DeepSeek R1 14B | 14B | Q4_K_M | 9.9 GB | Runs | ~55 tok/s |
| Phi-4 14B | 14B | Q4_K_M | 9.9 GB | Runs | ~55 tok/s |
| Phi-4 Reasoning 14B | 14B | Q4_K_M | 11 GB | Runs | ~50 tok/s |
| Qwen 2.5 14B | 14B | Q4_K_M | 9.9 GB | Runs | ~55 tok/s |
| Qwen 2.5 Coder 14B | 14B | Q4_K_M | 12 GB | Runs | ~46 tok/s |
| Qwen 3 14B | 14B | Q4_K_M | 12 GB | Runs | ~46 tok/s |
| StarCoder2 15B | 15B | Q8_0 | 17 GB | Runs | ~32 tok/s |
| Codestral 22B | 22B | Q4_K_M | 14.7 GB | Runs | ~37 tok/s |
| Devstral 24B | 24B | Q4_K_M | 17 GB | Runs | ~32 tok/s |
| Magistral Small 24B | 24B | Q4_K_M | 17 GB | Runs | ~32 tok/s |
| Mistral Small 3.1 24B | 24B | Q4_K_M | 18 GB | Runs | ~30 tok/s |
| Gemma 2 27B | 27B | Q4_K_M | 17.7 GB | Runs | ~31 tok/s |
| Gemma 3 27B | 27B | Q4_K_M | 20 GB | Runs | ~27 tok/s |
| Qwen 3 30B-A3B (MoE) | 30B | Q4_K_M | 22 GB | Runs | ~25 tok/s |
| Cogito 32B | 32B | Q4_K_M | 21.5 GB | Runs | ~25 tok/s |
| DeepSeek R1 32B | 32B | Q4_K_M | 20.7 GB | Runs | ~26 tok/s |
| Qwen 2.5 32B | 32B | Q4_K_M | 20.7 GB | Runs | ~26 tok/s |
| Qwen 2.5 Coder 32B | 32B | Q4_K_M | 23 GB | Runs | ~24 tok/s |
| Qwen 3 32B | 32B | Q4_K_M | 23 GB | Runs | ~24 tok/s |
| QwQ 32B | 32B | Q4_K_M | 21.5 GB | Runs | ~25 tok/s |
| Command R 35B | 35B | Q4_K_M | 22.5 GB | Runs | ~24 tok/s |
| Mixtral 8x7B | 47B | Q4_K_M | 29.7 GB | Runs | ~18 tok/s |
| Cogito 70B | 70B | Q4_K_M | 43 GB | Runs | ~13 tok/s |
| DeepSeek R1 70B | 70B | Q4_K_M | 43.5 GB | Runs | ~13 tok/s |
| Llama 3.1 70B | 70B | Q4_K_M | 43.5 GB | Runs | ~13 tok/s |
| Llama 3.3 70B | 70B | Q4_K_M | 43.5 GB | Runs | ~13 tok/s |
| Qwen 2.5 72B | 72B | Q4_K_M | 44.7 GB | Runs | ~12 tok/s |
| Qwen 2.5 VL 72B | 72B | Q4_K_M | 41 GB | Runs | ~13 tok/s |
| Llama 3.2 Vision 90B | 90B | Q4_K_M | 50 GB | Runs | ~11 tok/s |
| Command R+ 104B | 104B | Q4_K_M | 57 GB | Runs | ~10 tok/s |
| Llama 4 Scout (109B/17B active) | 109B | Q4_K_M | 72 GB | Runs | ~8 tok/s |
| Command A 111B | 111B | Q4_K_M | 61 GB | Runs | ~9 tok/s |
| Devstral 2 123B | 123B | Q4_K_M | 67 GB | Runs | ~8 tok/s |
| Mistral Large 2 123B | 123B | Q4_K_M | 67 GB | Runs | ~8 tok/s |
| Mixtral 8x22B | 141B | Q4_K_M | 86 GB | Runs | ~6 tok/s |
| Qwen 3 235B-A22B | 235B | Q4_K_M | 138 GB | CPU Offload | ~1 tok/s |
5
model(s) are too large for this hardware.