Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

AI research tools might be creating more problems than they solve

A University of Surrey study warns that the proliferation of AI tools may be weakening scientific rigor leading to a surge in "low-quality" and "science fiction" research papers.

byEmre Çıtak
May 13, 2025
in Research
Home Research
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail
Google Preferred Source

A new study has uncovered an alarming rise in formulaic research papers derived from the National Health and Nutrition Examination Survey (NHANES), suggesting that artificial intelligence tools are being misused to mass-produce statistically weak and potentially misleading scientific literature. The authors point to a surge in single-factor analyses that disregard multifactorial complexity, exploit open data selectively, and bypass robust statistical corrections.

Between 2014 and 2021, just four such papers were published each year. But in 2024 alone, up to October 9, the tally had ballooned to 190. This exponential growth, paired with a shift in publication origins and a reliance on automation, indicates that AI-assisted pipelines may be accelerating low-quality manuscript production. At the heart of the problem is the misuse of NHANES, a respected and AI-ready U.S. government dataset originally developed to evaluate public health trends across the population.

Unpacking the NHANES problem

NHANES provides an exceptionally rich dataset, combining clinical, behavioral, and laboratory data across thousands of variables. It is accessible through APIs and has standardized Python and R libraries, allowing researchers to extract and analyze the data efficiently. This makes it a valuable tool for both public health researchers and AI developers. But this very convenience also creates a vulnerability: it allows researchers to generate results quickly, and with minimal oversight, leading to an explosion of formulaic research.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

The new study analyzed 341 NHANES-based papers published between 2014 and 2024 that relied on single-variable correlations. These papers, on average, appeared in moderate-impact journals (average impact factor of 3.6), and often focused on conditions like depression, diabetes, or cardiovascular disease. Instead of exploring the multifactorial nature of these conditions, the studies typically drew statistical significance from a single independent variable, bypassing false discovery correction and frequently relying on unexplained data subsetting.

One major concern is that multifactorial health conditions—such as mental health disorders, chronic inflammation, or cardiovascular disease—were analyzed using methods more suited for simple binary relationships. In effect, these studies presented findings that stripped away nuance and ignored the reality that health outcomes are rarely driven by a single factor.

Depression was used as a case study, with 28 individual papers claiming associations between the condition and various independent variables. However, only 13 of these associations remained statistically significant after applying False Discovery Rate (FDR) correction. Without proper correction, these publications risk introducing a high volume of Type I errors into the scientific literature. In some instances, researchers appeared to recycle variables as both predictors and outcomes across papers, further muddying the waters.


Microsoft’s ADeLe wants to give your AI a cognitive profile


Selective data mining and HARKing

Another issue uncovered by the authors was the use of unjustified data subsets. Although NHANES provides a broad timeline of health data dating back to 1999, many researchers chose narrow windows of analysis without disclosing rationale. For example, some studies used only the 2003 to 2018 window to analyze diabetes and inflammation, despite broader data availability. The practice hints at data dredging or HARKing, hypothesizing after results are known, a methodologically flawed approach that undermines reproducibility and transparency.

The median study analyzed just four years of NHANES data, despite the database offering over two decades of information. This selective sampling enables authors to increase the likelihood of achieving significant results without accounting for the full dataset’s complexity, making it easier to produce and publish manuscripts in high volume.

Out of the 341 papers reviewed, over 50 percent originated from just three publisher families: Frontiers, BioMed Central, and Springer. More notably, the country of origin shifted dramatically. Prior to 2021, only 8 percent of primary authors were based in China. Between 2021 and 2024, this rose to 92 percent. While this could reflect changing research priorities or policy incentives, the magnitude and timing suggest coordinated use of automated pipelines possibly linked to paper mill operations.

The findings pose a serious challenge to the integrity of scientific literature. Single-variable studies that fail to consider complex interdependencies are more likely to be misleading. When repeated at scale, such research floods the academic ecosystem with papers that meet publication thresholds but offer little new insight. This is compounded by weak peer review and the growing pressure on researchers to publish frequently and rapidly.

The authors warn that these practices, if left unchecked, could shift the balance in some subfields where manufactured papers outnumber legitimate ones. The use of AI to accelerate manuscript generation only amplifies this risk. As generative models become more accessible, they enable rapid conversion of statistical outputs into full-length manuscripts, reducing the time and expertise required to publish scientific articles.

Recommendations for stakeholders:

To mitigate the risks of AI-enabled data dredging and mass-produced research, the authors propose several concrete steps:

  • For researchers: Acknowledge the limitations of single-factor studies and incorporate multifactorial analysis where appropriate. Clearly justify any data subsetting or hypothesis changes.
  • For data providers: Introduce auditable access via API keys or application IDs to discourage indiscriminate mining. Require that any publication citing their datasets disclose the full data extraction history.
  • For publishers: Increase desk rejection rates for formulaic papers. Employ dedicated statistical reviewers. Use templates to identify manuscripts using identical pipelines with only variable swaps.
  • For peer reviewers: Treat the use of single-variable analysis for complex conditions as a red flag. Request clarification when statistical rigor is lacking or data subsets are poorly justified.
  • For the broader scientific community: Engage in post-publication review. Platforms like PubPeer should be actively used to flag questionable practices, even when the statistical methods appear superficially sound.

Featured image credit

Tags: AI

Related Posts

Faith in large employers is fading among UK workers

Faith in large employers is fading among UK workers

June 5, 2026
Army-funded scientists explore a new frontier in quantum physics

Army-funded scientists explore a new frontier in quantum physics

June 5, 2026
New MIT process could make lithium production cheaper and cleaner

New MIT process could make lithium production cheaper and cleaner

June 4, 2026
Researchers create AI worm that adapts attacks without human input

Researchers create AI worm that adapts attacks without human input

June 4, 2026
Researchers unlock 20-fold enhancement in ultrafast laser experiments

Researchers unlock 20-fold enhancement in ultrafast laser experiments

June 3, 2026
NASA tests next-gen radiation-hardened space computer chip

NASA tests next-gen radiation-hardened space computer chip

May 29, 2026

LATEST NEWS

“Free robots are an illusion”: Why we’ll pay for system intelligence, not delivery workers

How Henrique Schmaiske led Meteor.js through its biggest transformation

Proven privacy: Why ‘no-log’ claims need real evidence today

ChatGPT hits 1 billion users as global AI adoption surges despite backlash

Huawei launches HarmonyOS 7 developer beta with upgraded API 26

OpenAI Codex referral program rewards users with extra rate resets

BEST AI MODELS LEADERBOARD

See the best AI models, ranked by intelligence, benchmark results, speed and token price. Find the most suitable LLMs, Text-to-Image, Image Editing, Text-to-Speech, Text-to-Video and Image-to-Video  artificial intelligence model for your tasks and business.

LATEST TOOLS

Roboto AI

Pickaxe

Pfpmaker

MindPal

Syllaby

ScreenApp

FinanceBrain

GitHub Spark

Hints

VisionStory AI

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies to improve your experience. You can choose to accept or reject them. Visit our Privacy Policy.