Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

OpenAI might have trained its AI on stolen books

The paper's methodology, DE-COP, determines if a model distinguishes between human-authored texts and AI-generated paraphrases.

byKerem Gülen
April 2, 2025
in Artificial Intelligence, News
Home News Artificial Intelligence
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail

OpenAI is facing accusations of training its AI models on copyrighted material without permission, as a new paper alleges the company used paywalled books from O’Reilly Media to train its GPT-4o model. The AI Disclosures Project, a nonprofit co-founded by Tim O’Reilly and Ilan Strauss, published the paper.

AI models function as prediction engines, learning patterns from extensive data like books and movies to extrapolate from prompts. While some AI labs are using AI-generated data as real-world sources diminish, training on purely synthetic data carries risks, such as impacting a model’s performance.

The paper’s methodology, DE-COP, determines if a model distinguishes between human-authored texts and AI-generated paraphrases. This suggests whether the model has prior knowledge from its training data. Researchers probed GPT-4o, GPT-3.5 Turbo, and other OpenAI models, using 13,962 excerpts from 34 O’Reilly books to estimate the probability of inclusion in training datasets.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

Results indicated GPT-4o recognized significantly more paywalled O’Reilly book content than older models like GPT-3.5 Turbo. According to the paper, GPT-4o likely recognizes many non-public O’Reilly books published before its training cutoff date. O’Reilly doesn’t have a licensing agreement with OpenAI, according to the paper.

The co-authors acknowledge the method isn’t foolproof and OpenAI might have collected excerpts from users’ ChatGPT inputs. Another caveat is that more recent OpenAI models, including GPT-4.5, weren’t evaluated.

OpenAI, advocating for looser copyright restrictions, has sought higher-quality training data, hiring journalists to fine-tune model outputs. The company also has licensing deals with news publishers and offers opt-out mechanisms for copyright owners. OpenAI has not commented on the paper.


Featured image credit

Tags: chatgptopenAI

Related Posts

Nvidia hits 200 teraFLOP emulated FP64 for scientific computing

Nvidia hits 200 teraFLOP emulated FP64 for scientific computing

January 19, 2026
Walmart maintains Apple Pay ban in U.S. stores for 2026

Walmart maintains Apple Pay ban in U.S. stores for 2026

January 19, 2026
iOS 27: Everything we know so far

iOS 27: Everything we know so far

January 19, 2026
Google Wallet and Tasks integrations surface in new Pixel 10 leak

Google Wallet and Tasks integrations surface in new Pixel 10 leak

January 19, 2026
Threads hits 141 million daily users to claim the mobile throne from X

Threads hits 141 million daily users to claim the mobile throne from X

January 19, 2026
Microsoft pushes emergency OOB update to fix Windows 11 restart loop

Microsoft pushes emergency OOB update to fix Windows 11 restart loop

January 19, 2026

LATEST NEWS

Nvidia hits 200 teraFLOP emulated FP64 for scientific computing

Walmart maintains Apple Pay ban in U.S. stores for 2026

iOS 27: Everything we know so far

Google Wallet and Tasks integrations surface in new Pixel 10 leak

Threads hits 141 million daily users to claim the mobile throne from X

Microsoft pushes emergency OOB update to fix Windows 11 restart loop

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.