Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

Meta AI’s Llama 3.1 405B surprisingly beats GPT-4o

Early benchmark data for the forthcoming Llama 3.1 models—including the 8B, 70B, and the colossal 405B—were leaked on the LocalLLaMA subreddit today

byKerem Gülen
July 23, 2024
in Artificial Intelligence
Home News Artificial Intelligence
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail

Leaked benchmarks regarding Meta AI’s Llama 3.1 405B show that this open-source LLM has a lot of potential.

Leaked: Meta AI Llama 3.1 405B benchmarks

Meta introduced Llama 3 in April 2024 as a new generation of cutting-edge, open-source large language models. The initial release included Llama 3 8B and Llama 3 70B, both of which established new performance benchmarks for LLMs in their respective sizes. However, within just three months, several other models have managed to surpass these initial benchmarks, indicating the rapid pace of advancement in the field of artificial intelligence.

Update: The 3.1 405B model is available for use at https://www.llama2.ai/. It’s important to note that the information feeding this model is up-to-date only until April 2023. Therefore, while this model offers a robust and sophisticated AI resource, its understanding and responses are based on data accumulated until that specified period. This limitation could influence the model’s effectiveness in handling queries about more recent developments or events.

Meta has announced that its most ambitious model in the Llama 3 series will boast over 400 billion parameters, a massive leap in scale that is still undergoing training. In a dramatic turn of events, early benchmark data for the forthcoming Llama 3.1 models—including the 8B, 70B, and the colossal 405B—were leaked on the LocalLLaMA subreddit today. The preliminary results suggest that the Llama 3.1 405B model could potentially surpass the performance of the current industry leader, OpenAI’s GPT-4o, across several critical AI benchmarks.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

Should the Llama 3.1 405B model indeed surpass GPT-4o, it would represent the first instance of an open-source model eclipsing a leading closed-source LLM.

BenchmarksGPT-4oMeta Llama-3.1-405BMeta Llama-3.1-70BMeta Llama-3-70BMeta Llama-3.1-8BMeta Llama-3-8B
boolq0.9050.9210.9090.8920.8710.82
gsm8k0.9420.9680.9480.8330.8440.572
hellaswag0.8910.920.9080.8740.7680.462
human_eval0.9210.8540.7930.390.6830.341
mmlu_humanities0.8020.8180.7950.7060.6190.56
mmlu_other0.8720.8750.8520.8250.740.709
mmlu_social_sciences0.9130.8980.8780.8720.7610.741
mmlu_stem0.6960.8310.7710.6960.5950.561
openbookqa0.8820.9080.9360.9280.8520.802
piqa0.8440.8740.8620.8940.8010.764
social_iqa0.790.7970.8130.7890.7340.667
truthfulqa_mc10.8250.80.7690.520.6060.327
winogrande0.8220.8670.8450.7760.650.56

As you can see above, leaked benchmarks reveal that Meta’s Llama 3.1 models outshine OpenAI’s GPT-4 in a variety of tests, establishing a new standard in several crucial areas of AI performance. Notably, Llama 3.1 excels in benchmarks such as GSM8K, Hellaswag, BoolQ, MMLU-humanities, MMLU-other, MMLU-STEM, and Winograd. However, it trails behind in the HumanEval and MMLU-social sciences tests, indicating areas where further refinement is needed.

It is critical to recognize that these benchmarks reflect the performance of the base models of Llama 3.1. The true potential of these models can be realized through instruction-tuning, a process that can significantly enhance their capabilities. The forthcoming Instruct versions of the Llama 3.1 models are expected to yield even better results, showcasing improvements across various benchmarks.

Meta AI Llama 3.1 405B surprisingly beats GPT-4o
Leaked benchmarks regarding Meta AI’s Llama 3.1 405B show that this open-source LLM has a lot of potential (Image credit)

Stressing out the importance of open-source initiatives

While GPT-5 may challenge Llama 3.1’s emerging dominance, the impressive performance of Llama 3.1 against GPT-4 underscores the growing influence and capability of open-source AI initiatives.

“We are embracing the open source ethos of releasing early and often to enable the community to get access to these models while they are still in development. The text-based models we are releasing today are the first in the Llama 3 collection of models. Our goal in the near future is to make Llama 3 multilingual and multimodal, have longer context, and continue to improve overall performance across core LLM capabilities such as reasoning and coding,” stated Meta in a blog post when launching Llama 3.

The significance of open-source AI cannot be overstated. By making their advanced models accessible to the public, Meta not only democratizes technology but also taps into the collective intelligence and diverse perspectives of the global developer community. This approach contrasts sharply with closed-source models, which are typically accessible only to a select group of users and researchers, thereby limiting the potential for widespread innovation and enhancement.


Featured image credit: Penfer/Unsplash

Tags: Benchmarkgpt-4oLLama 3meta AI

Related Posts

Lenovo and Motorola introduce Qira cross-device AI assistant

Lenovo and Motorola introduce Qira cross-device AI assistant

January 7, 2026
Why 2026 is the Year Healthcare Trades Documentation for “Hireable” AI Agents

Why 2026 is the Year Healthcare Trades Documentation for “Hireable” AI Agents

January 7, 2026
Viral Reddit whistleblower exposed as AI hoax

Viral Reddit whistleblower exposed as AI hoax

January 7, 2026
Amazon takes Alexa to the web with launch of Alexa.com at CES 2026

Amazon takes Alexa to the web with launch of Alexa.com at CES 2026

January 6, 2026
Bosch unveils AI extension platform for smart cockpits at CES 2026

Bosch unveils AI extension platform for smart cockpits at CES 2026

January 5, 2026
Samsung reveals “Companion to AI Living” vision at CES 2026

Samsung reveals “Companion to AI Living” vision at CES 2026

January 5, 2026

LATEST NEWS

Meta expands neural wristband tech to cars and accessibility at CES 2026

iPolish unveils color-changing smart nails at CES 2026

Lenovo and Motorola introduce Qira cross-device AI assistant

Motorola expands Moto Things lineup at CES 2026

Lenovo reveals Legion Go 2 with SteamOS at CES 2026

CES 2026: Lenovo unveils XD Rollable Concept with wrap-around screen

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.