Mistral, a French AI startup, has made waves in the AI community with the release of Mixtral 8x7B, its latest open-source AI model. This model has garnered attention for potentially surpassing OpenAI’s GPT-3.5 and Meta’s Llama 2 in performance. The company adopted a unique approach by releasing its latest large language model unceremoniously via a torrent link on social media. This move contrasts the typical fanfare associated with AI releases, showcasing Mistral’s distinct, hacker-like attitude​​.
Mixtral 8x7B: A new AI powerhouse
Recently, Mistral raised an impressive $415 million in a Series A funding round, pushing its valuation to around $2 billion. This financial growth highlights the company’s success and potential in the AI sector​​. Mixtral 8x7B, employing a “mixture of experts” approach, integrates various models, each specializing in different tasks. This innovative technique has led to its impressive performance, equating or outperforming GPT-3.5 and Llama 2 in various benchmarks. Mistral released this model online, followed by an official blog post detailing its capabilities, and confirmed that it’s available for commercial use under an Apache 2.0 license​​.
Small footprint: It can run on a Mac
One of the notable features of Mixtral 8x7B is its ability to run on non-GPU devices, potentially democratizing access to advanced AI technology. The model achieves state-of-the-art results among open models, with strengths in language generation over long contexts and code generation​​​​.
For those who don’t follow AI closely:
1) An open source model (free, anyone can download or modify) beats GPT-3.5
2) It has no safety guardrails
There are good things about this release, but also regulators, IT security experts, etc. should note the genie is out of the bottle. https://t.co/nHvlNKaItw— Ethan Mollick (@emollick) December 11, 2023
AI enthusiasts and professionals have quickly adopted Mixtral 8x7B, impressed by its performance and flexibility. The model’s small footprint allows it to run on machines without dedicated GPUs, including the latest Apple Mac computers. However, its lack of safety guardrails, as observed by Wharton School professor Ethan Mollick, has raised concerns about content deemed unsafe by other models​​.
6x faster than Llama 2 70B
Mixtral 8x7B stands out with its six times faster inference speed compared to Llama 2 70B, thanks to its sparse model architecture and eight different feedforward blocks in the Transformer. It supports multilingual capabilities, excellent code generation, and a 32k context window​​. Mistral’s valuation soared to over $2 billion in just six months, highlighting the growing importance of large Mixture of Experts models in the AI landscape​​.
Open-source with no limits
Mixtral 8x7B, an open-source model, is proving to be a game-changer. It not only outperforms some U.S. competitors like Meta’s Llama 2 family and OpenAI’s GPT-3.5 but also offers fast and efficient performance. The model’s open-source availability stands in contrast to OpenAI’s closed-source approach, aligning with Mistral’s commitment to an “open, responsible, and decentralized approach to technology”​​.
Mistral’s model is a high-quality sparse mixture of expert models (SMoE) with open weights, licensed under Apache 2.0. It has shown superior performance on most benchmarks compared to Llama 2 70B, achieving six times faster inference. This efficiency marks Mixtral 8x7B as the strongest open-weight model in terms of cost and performance​​.