Mixtral 8x7B
Apache 2.0Mistral AI · 47B · transformer-decoder
2023-12-11 33K context
47B params
Use Cases
chat code reasoning multilingual math writing summary
Quantization Options
About this model
Mixtral 8x7B is Mistral AI's mixture-of-experts model, utilizing eight expert networks of 7B parameters each with a routing mechanism that activates two experts per token. This architecture gives it 47B total parameters but only uses about 13B during inference, providing excellent efficiency.
The model delivers performance competitive with much larger dense models while maintaining faster inference speeds. It excels at reasoning, multilingual tasks, and code generation, and is particularly well-suited for users who need high-quality output with reasonable hardware requirements.
Benchmarks
70.6
mmlu