Which needs more VRAM, Llama 3.1 70B or Qwen 2.5 72B?

Qwen 2.5 72B needs 44.7 GB at Q4_K_M vs Llama 3.1 70B's 43.5 GB at Q4_K_M. Llama 3.1 70B is more memory-efficient.

Can I run Llama 3.1 70B and Qwen 2.5 72B on the same GPU?

Llama 3.1 70B requires 43.5–72 GB and Qwen 2.5 72B requires 44.7–74 GB depending on quantization. You can run them sequentially on the same hardware — Ollama handles model swapping automatically.

Llama 3.1 70B vs Qwen 2.5 72B

Comparing VRAM requirements, performance, and capabilities for running these models locally with Ollama.

Llama 3.1 70B

Parameters

70B

Context

128K

VRAM Range

43.5–72 GB

Recommended

Q4_K_M (43.5 GB)

By Meta · License Llama 3.1 Community License

Qwen 2.5 72B

Parameters

72B

Context

128K

VRAM Range

44.7–74 GB

Recommended

Q4_K_M (44.7 GB)

By Alibaba · License Qwen License

VRAM Requirements by Quantization

Side-by-side memory needs at each quality level.

Quantization	Llama 3.1 70B	Qwen 2.5 72B	Difference
Q4_K_M	43.5 GB	44.7 GB	-1.2 GB
Q8_0	72 GB	74 GB	-2.0 GB

Capabilities

Feature support comparison.

Capability	Llama 3.1 70B	Qwen 2.5 72B
text generation	Yes	Yes
code generation	Yes	Yes
reasoning	Yes	Yes
multilingual	Yes	Yes
tool use	Yes	Yes
math	Yes	Yes
creative writing	Yes	Yes
summarization	Yes	Yes

Benchmark Scores

Higher is better. Scores from published evaluations.

Benchmark	Llama 3.1 70B	Qwen 2.5 72B
mmlu	83.6	85.3

Hardware Compatibility

Can each model run at recommended quantization on common VRAM tiers?

VRAM	Llama 3.1 70B	Qwen 2.5 72B
8 GB	No	No
12 GB	No	No
16 GB	No	No
24 GB	No	No
32 GB	Offload	Offload
48 GB	Tight	Tight
64 GB	Runs	Runs
96 GB	Runs	Runs

Run Llama 3.1 70B


ollama run 70b-instruct-q4_K_M

Run Qwen 2.5 72B


ollama run 72b-instruct-q4_K_M

Check your exact hardware

Use the compatibility checker to see how each model performs on your specific GPU or Mac.

Llama 3.1 70B details Qwen 2.5 72B details Compatibility Checker