Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

GPT-5.2 surpasses expert PhD baseline with 92% science score

The new FrontierScience evaluation shows GPT-5.2 leading with a 77% score in Olympiad-style reasoning and 25% in complex, open-ended research tasks.

byKerem Gülen
December 24, 2025
in Research
Home Research
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail

GPT-5.2 scored 92% on a “Google-Proof” science benchmark, significantly surpassing the 70% expert baseline. The advanced model also achieved medal-winning performance in major international competitions, demonstrating its evolving capabilities in scientific reasoning.

Scientists extensively use these systems for tasks like literature searches across various disciplines and languages, as well as navigating complex mathematical proofs. This development often reduces work that typically takes days or weeks to just a few hours. The paper, Early science acceleration experiments with GPT-5, published in November 2025, provides initial evidence that GPT-5 can notably expedite scientific workflows.

To further measure and forecast AI models’ ability to accelerate scientific research, developers introduced FrontierScience, a new benchmark designed to assess expert-level scientific capabilities. The benchmark contains questions written and verified by experts in physics, chemistry, and biology, focusing on originality and difficulty.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

FrontierScience features two distinct tracks:

  • Olympiad: Measures scientific reasoning abilities in the style of international Olympiad competitions.
  • Research: Evaluates real-world scientific research capabilities.

In initial evaluations, GPT-5.2 emerged as the top-performing model on both FrontierScience-Olympiad, scoring 77%, and Research, scoring 25%. This performance positions it ahead of other frontier models, including Claude Opus 4.5 and Gemini 3 Pro. The results indicate that current models can support structured reasoning aspects of research, though significant work remains to enhance their open-ended thinking capabilities.

FrontierScience encompasses over 700 textual questions, with 160 in its gold set, spanning subfields in physics, chemistry, and biology. FrontierScience-Olympiad features 100 questions collaboratively designed by 42 international Olympiad medalists and national team coaches. FrontierScience-Research includes 60 original research subtasks developed by 45 PhD scientists, including doctoral candidates, professors, and postdoctoral researchers.

For the Olympiad set, grading occurs through short answer verification. For the Research track, a rubric-based architecture with a 10-point scoring system evaluates open-ended tasks. This rubric assesses both the final answer and intermediate reasoning steps. A model-based grader, GPT-5, evaluates responses against these criteria. Each task’s creation involved selecting against internal models, which may bias evaluations against specific models.

Key performance results include:

  • FrontierScience-Olympiad Accuracy:
    • GPT-5.2: 77.1%
    • Gemini 3 Pro: 76.1%
    • Claude Opus 4.5: 71.4%
  • FrontierScience-Research Accuracy:
    • GPT-5.2: 25.2%
    • Claude Opus 4.5: 17.5%
    • Grok 4: 15.9%

Longer processing times, or higher reasoning efforts, correlated with improved accuracy for both GPT-5.2 and OpenAI o3. For instance, GPT-5.2’s accuracy on FrontierScience-Olympiad increased from 67.5% at “Low” reasoning effort to 77.1% at “XHigh” effort. Similarly, on FrontierScience-Research, GPT-5.2’s accuracy rose from 18.2% at “Low” to 25.2% at “XHigh.”

FrontierScience currently focuses on constrained problem statements and does not assess the generation of novel hypotheses or interactions with multimodal data. Developers plan to iterate on the benchmark, expanding it to new domains and integrating more real-world evaluations as models improve.


Featured image credit

Tags: FrontierSciencegpt-5.2openAI

Related Posts

Miggo Security bypasses Google Gemini defenses via calendar invites

Miggo Security bypasses Google Gemini defenses via calendar invites

January 21, 2026
JWST identifies SN Eos: The most distant supernova ever spectroscopically confirmed

JWST identifies SN Eos: The most distant supernova ever spectroscopically confirmed

January 21, 2026
How AI built VoidLink malware in just seven days

How AI built VoidLink malware in just seven days

January 20, 2026
Forrester analyst: AI has failed to move the needle on global productivity

Forrester analyst: AI has failed to move the needle on global productivity

January 19, 2026
OpenAI GPT 5.2 cracks Erdős math problem in 15 minutes

OpenAI GPT 5.2 cracks Erdős math problem in 15 minutes

January 19, 2026
Appfigures: Mobile app spending hits record 5.8 billion

Appfigures: Mobile app spending hits record $155.8 billion

January 15, 2026

LATEST NEWS

Apple to shrink iPhone 18 Pro Dynamic Island by hiding Face ID sensors

OnePlus faces dismantling claims after 20% drop in global phone shipments

Nvidia shares slide as Inventec warns of H200 chip delays in China

DeepSeek reveals MODEL1 architecture in GitHub update ahead of V4

Altman breaks anti-ad stance with “sponsored” links below ChatGPT answers

Samsung leaks then deletes Bixby overhaul featuring Perplexity search

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.