Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

Meta AI’s Llama 3.1 405B surprisingly beats GPT-4o

Early benchmark data for the forthcoming Llama 3.1 models—including the 8B, 70B, and the colossal 405B—were leaked on the LocalLLaMA subreddit today

byKerem Gülen
July 23, 2024
in Artificial Intelligence

Leaked benchmarks regarding Meta AI’s Llama 3.1 405B show that this open-source LLM has a lot of potential.

Leaked: Meta AI Llama 3.1 405B benchmarks

Meta introduced Llama 3 in April 2024 as a new generation of cutting-edge, open-source large language models. The initial release included Llama 3 8B and Llama 3 70B, both of which established new performance benchmarks for LLMs in their respective sizes. However, within just three months, several other models have managed to surpass these initial benchmarks, indicating the rapid pace of advancement in the field of artificial intelligence.

Update: The 3.1 405B model is available for use at https://www.llama2.ai/. It’s important to note that the information feeding this model is up-to-date only until April 2023. Therefore, while this model offers a robust and sophisticated AI resource, its understanding and responses are based on data accumulated until that specified period. This limitation could influence the model’s effectiveness in handling queries about more recent developments or events.

Meta has announced that its most ambitious model in the Llama 3 series will boast over 400 billion parameters, a massive leap in scale that is still undergoing training. In a dramatic turn of events, early benchmark data for the forthcoming Llama 3.1 models—including the 8B, 70B, and the colossal 405B—were leaked on the LocalLLaMA subreddit today. The preliminary results suggest that the Llama 3.1 405B model could potentially surpass the performance of the current industry leader, OpenAI’s GPT-4o, across several critical AI benchmarks.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

Should the Llama 3.1 405B model indeed surpass GPT-4o, it would represent the first instance of an open-source model eclipsing a leading closed-source LLM.

Benchmarks GPT-4o Meta Llama-3.1-405B Meta Llama-3.1-70B Meta Llama-3-70B Meta Llama-3.1-8B Meta Llama-3-8B
boolq 0.905 0.921 0.909 0.892 0.871 0.82
gsm8k 0.942 0.968 0.948 0.833 0.844 0.572
hellaswag 0.891 0.92 0.908 0.874 0.768 0.462
human_eval 0.921 0.854 0.793 0.39 0.683 0.341
mmlu_humanities 0.802 0.818 0.795 0.706 0.619 0.56
mmlu_other 0.872 0.875 0.852 0.825 0.74 0.709
mmlu_social_sciences 0.913 0.898 0.878 0.872 0.761 0.741
mmlu_stem 0.696 0.831 0.771 0.696 0.595 0.561
openbookqa 0.882 0.908 0.936 0.928 0.852 0.802
piqa 0.844 0.874 0.862 0.894 0.801 0.764
social_iqa 0.79 0.797 0.813 0.789 0.734 0.667
truthfulqa_mc1 0.825 0.8 0.769 0.52 0.606 0.327
winogrande 0.822 0.867 0.845 0.776 0.65 0.56

As you can see above, leaked benchmarks reveal that Meta’s Llama 3.1 models outshine OpenAI’s GPT-4 in a variety of tests, establishing a new standard in several crucial areas of AI performance. Notably, Llama 3.1 excels in benchmarks such as GSM8K, Hellaswag, BoolQ, MMLU-humanities, MMLU-other, MMLU-STEM, and Winograd. However, it trails behind in the HumanEval and MMLU-social sciences tests, indicating areas where further refinement is needed.

It is critical to recognize that these benchmarks reflect the performance of the base models of Llama 3.1. The true potential of these models can be realized through instruction-tuning, a process that can significantly enhance their capabilities. The forthcoming Instruct versions of the Llama 3.1 models are expected to yield even better results, showcasing improvements across various benchmarks.

Meta AI Llama 3.1 405B surprisingly beats GPT-4o
Leaked benchmarks regarding Meta AI’s Llama 3.1 405B show that this open-source LLM has a lot of potential (Image credit)

Stressing out the importance of open-source initiatives

While GPT-5 may challenge Llama 3.1’s emerging dominance, the impressive performance of Llama 3.1 against GPT-4 underscores the growing influence and capability of open-source AI initiatives.

“We are embracing the open source ethos of releasing early and often to enable the community to get access to these models while they are still in development. The text-based models we are releasing today are the first in the Llama 3 collection of models. Our goal in the near future is to make Llama 3 multilingual and multimodal, have longer context, and continue to improve overall performance across core LLM capabilities such as reasoning and coding,” stated Meta in a blog post when launching Llama 3.

The significance of open-source AI cannot be overstated. By making their advanced models accessible to the public, Meta not only democratizes technology but also taps into the collective intelligence and diverse perspectives of the global developer community. This approach contrasts sharply with closed-source models, which are typically accessible only to a select group of users and researchers, thereby limiting the potential for widespread innovation and enhancement.


Featured image credit: Penfer/Unsplash

Tags: Benchmarkgpt-4oLLama 3meta AI

Related Posts

Samsung Internet beta brings Galaxy AI to Windows PCs

Samsung Internet beta brings Galaxy AI to Windows PCs

October 31, 2025
Tim Cook says Siri’s delayed AI upgrade is finally on track for 2026

Tim Cook says Siri’s delayed AI upgrade is finally on track for 2026

October 31, 2025
Adobe turns Photoshop into a chatbot that edits, renames and collaborates

Adobe turns Photoshop into a chatbot that edits, renames and collaborates

October 31, 2025
Chrome tests “Nano Banana” and “Deep Search” AI buttons

Chrome tests “Nano Banana” and “Deep Search” AI buttons

October 31, 2025
Canva unveils its Creative Operating System to rival Adobe

Canva unveils its Creative Operating System to rival Adobe

October 31, 2025
OpenAI Sora adds character cameos and video stitching

OpenAI Sora adds character cameos and video stitching

October 30, 2025

LATEST NEWS

Tech News Today: Nvidia builds the AI world while Adobe and Canva fight to rule it

Disney+ and Hulu streams now look sharper on Samsung TVs with HDR10+

Min Mode: Android 17 to have a special Always-On Display

Samsung Internet beta brings Galaxy AI to Windows PCs

Amazon cancels its Lord of the Rings MMO again

Windows 11 on Quest 3: Microsoft’s answer to Vision Pro

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.