Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

OpenAI’s o3 claimed 25%, independent test says “try 10”

OpenAI says o3 was “tuned for speed,” but researchers found its FrontierMath performance underwhelming.

byKerem Gülen
April 21, 2025
in Artificial Intelligence, News
Home News Artificial Intelligence
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail
Google Preferred Source

OpenAI’s o3 AI model scored lower on the FrontierMath benchmark than the company initially implied, according to independent tests by Epoch AI, the research institute behind FrontierMath. When OpenAI unveiled o3 in December, it claimed the model could answer 25% of FrontierMath questions, significantly outperforming other models.

Epoch AI’s tests found that o3 scored around 10% on FrontierMath. The discrepancy may be due to differences in testing setups or the version of o3 used. OpenAI’s chief research officer, Mark Chen, had stated that o3 achieved over 25% in “aggressive test-time compute settings.” Epoch noted that OpenAI’s published benchmark results showed a lower-bound score that matches the 10% score Epoch observed.

The public o3 model is “tuned for chat/product use” and has smaller compute tiers than the version tested by OpenAI in December, according to the ARC Prize Foundation, which tested a pre-release version of o3. OpenAI’s Wenda Zhou explained that the production o3 model is “more optimized for real-world use cases” and speed, which may result in benchmark disparities.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

openais-o3-claimed-25-percent-independent-test-says-try-10
Image: Epoch AI

OpenAI’s o3-mini-high and o4-mini models outperform o3 on FrontierMath. The company plans to release a more powerful o3 variant, o3-pro, in the coming weeks. This incident highlights the need for caution when interpreting AI benchmarks, particularly when they are used to promote commercial products.

The AI industry has seen several benchmarking controversies recently. In January, Epoch was criticized for not disclosing funding from OpenAI until after the company announced o3. xAI was accused of publishing misleading benchmark charts for its Grok 3 model, and Meta admitted to touting benchmark scores for a different version of a model than the one available to developers.


Featured image credit

Tags: chatgptFeaturedo3openAI

Related Posts

Amazon adds AI-generated product previews to search results

Amazon adds AI-generated product previews to search results

June 4, 2026
Meta launches AI business agents on WhatsApp, Instagram and Messenger

Meta launches AI business agents on WhatsApp, Instagram and Messenger

June 4, 2026
Nintendo will release a repair-friendly Switch 2 in Europe

Nintendo will release a repair-friendly Switch 2 in Europe

June 4, 2026
Google rolls out Ask Gemini in Drive to eligible Workspace users

Google rolls out Ask Gemini in Drive to eligible Workspace users

June 4, 2026
Google Wallet to add digital IDs from select EU countries this summer

Google Wallet to add digital IDs from select EU countries this summer

June 4, 2026
Why Telegram Mini Apps have become the optimal ecosystem for launching AI SaaS products

Why Telegram Mini Apps have become the optimal ecosystem for launching AI SaaS products

June 3, 2026

LATEST NEWS

Amazon adds AI-generated product previews to search results

Meta launches AI business agents on WhatsApp, Instagram and Messenger

Nintendo will release a repair-friendly Switch 2 in Europe

Google rolls out Ask Gemini in Drive to eligible Workspace users

Google Wallet to add digital IDs from select EU countries this summer

Why Telegram Mini Apps have become the optimal ecosystem for launching AI SaaS products

BEST AI MODELS LEADERBOARD

See the best AI models, ranked by intelligence, benchmark results, speed and token price. Find the most suitable LLMs, Text-to-Image, Image Editing, Text-to-Speech, Text-to-Video and Image-to-Video  artificial intelligence model for your tasks and business.

LATEST TOOLS

Roboto AI

Pickaxe

Pfpmaker

MindPal

Syllaby

ScreenApp

FinanceBrain

GitHub Spark

Hints

VisionStory AI

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies to improve your experience. You can choose to accept or reject them. Visit our Privacy Policy.