Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

GPT-5.2 surpasses expert PhD baseline with 92% science score

The new FrontierScience evaluation shows GPT-5.2 leading with a 77% score in Olympiad-style reasoning and 25% in complex, open-ended research tasks.

byKerem Gülen
December 24, 2025
in Research
Home Research
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail
Google Preferred Source

GPT-5.2 scored 92% on a “Google-Proof” science benchmark, significantly surpassing the 70% expert baseline. The advanced model also achieved medal-winning performance in major international competitions, demonstrating its evolving capabilities in scientific reasoning.

Scientists extensively use these systems for tasks like literature searches across various disciplines and languages, as well as navigating complex mathematical proofs. This development often reduces work that typically takes days or weeks to just a few hours. The paper, Early science acceleration experiments with GPT-5, published in November 2025, provides initial evidence that GPT-5 can notably expedite scientific workflows.

To further measure and forecast AI models’ ability to accelerate scientific research, developers introduced FrontierScience, a new benchmark designed to assess expert-level scientific capabilities. The benchmark contains questions written and verified by experts in physics, chemistry, and biology, focusing on originality and difficulty.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

FrontierScience features two distinct tracks:

  • Olympiad: Measures scientific reasoning abilities in the style of international Olympiad competitions.
  • Research: Evaluates real-world scientific research capabilities.

In initial evaluations, GPT-5.2 emerged as the top-performing model on both FrontierScience-Olympiad, scoring 77%, and Research, scoring 25%. This performance positions it ahead of other frontier models, including Claude Opus 4.5 and Gemini 3 Pro. The results indicate that current models can support structured reasoning aspects of research, though significant work remains to enhance their open-ended thinking capabilities.

FrontierScience encompasses over 700 textual questions, with 160 in its gold set, spanning subfields in physics, chemistry, and biology. FrontierScience-Olympiad features 100 questions collaboratively designed by 42 international Olympiad medalists and national team coaches. FrontierScience-Research includes 60 original research subtasks developed by 45 PhD scientists, including doctoral candidates, professors, and postdoctoral researchers.

For the Olympiad set, grading occurs through short answer verification. For the Research track, a rubric-based architecture with a 10-point scoring system evaluates open-ended tasks. This rubric assesses both the final answer and intermediate reasoning steps. A model-based grader, GPT-5, evaluates responses against these criteria. Each task’s creation involved selecting against internal models, which may bias evaluations against specific models.

Key performance results include:

  • FrontierScience-Olympiad Accuracy:
    • GPT-5.2: 77.1%
    • Gemini 3 Pro: 76.1%
    • Claude Opus 4.5: 71.4%
  • FrontierScience-Research Accuracy:
    • GPT-5.2: 25.2%
    • Claude Opus 4.5: 17.5%
    • Grok 4: 15.9%

Longer processing times, or higher reasoning efforts, correlated with improved accuracy for both GPT-5.2 and OpenAI o3. For instance, GPT-5.2’s accuracy on FrontierScience-Olympiad increased from 67.5% at “Low” reasoning effort to 77.1% at “XHigh” effort. Similarly, on FrontierScience-Research, GPT-5.2’s accuracy rose from 18.2% at “Low” to 25.2% at “XHigh.”

FrontierScience currently focuses on constrained problem statements and does not assess the generation of novel hypotheses or interactions with multimodal data. Developers plan to iterate on the benchmark, expanding it to new domains and integrating more real-world evaluations as models improve.


Featured image credit

Tags: FrontierSciencegpt-5.2openAI

Related Posts

CrowdStrike warns prompt injection attacks hit over 90 firms in 2025

CrowdStrike warns prompt injection attacks hit over 90 firms in 2025

June 29, 2026
Wireless charging uses about 40% more electricity

Wireless charging uses about 40% more electricity

June 25, 2026
European consumers may leave businesses using US tech providers

European consumers may leave businesses using US tech providers

June 24, 2026
Study links AI-assisted homework to lower exam scores

Study links AI-assisted homework to lower exam scores

June 22, 2026
Harvard and Boston Children’s use AI to revisit unsolved genetic cases

Harvard and Boston Children’s use AI to revisit unsolved genetic cases

June 19, 2026
Adobe report finds 86% of creators now use generative AI in workflows

Adobe report finds 86% of creators now use generative AI in workflows

June 17, 2026

LATEST NEWS

Apple touchscreen MacBook could launch with M5 Pro chips

Apple touchscreen MacBook could launch with M5 Pro chips

OpenAI limits ChatGPT 5.6 access to government-approved users first

Apple to skip M6 Pro and Max chips and launch M7 in 2027

IBM unveils world’s first sub-1nm chip with new nanostack architecture

Apple raises prices across Macs, iPads and home devices

BEST AI MODELS LEADERBOARD

See the best AI models, ranked by intelligence, benchmark results, speed and token price. Find the most suitable LLMs, Text-to-Image, Image Editing, Text-to-Speech, Text-to-Video and Image-to-Video  artificial intelligence model for your tasks and business.

LATEST TOOLS

Autoppt

Otter.ai

Slideoo

Disney Pixar AI Generator

Codebay

Newo

BlackInk.AI

WatchMyCompetitor

TokkingHeads

Fellow.app

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies to improve your experience. You can choose to accept or reject them. Visit our Privacy Policy.