Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

Claim: NVIDIA green-lit pirated book downloads for AI training

Internal documents show executives authorized the download of 500 terabytes of pirated data despite warnings of illegal origins.

byKerem Gülen
January 20, 2026
in Artificial Intelligence, News
Home News Artificial Intelligence
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail
Google Preferred Source

NVIDIA executives authorized using millions of pirated books from Anna’s Archive for AI training, according to an expanded class-action lawsuit. The suit, citing internal NVIDIA documents, alleges the company contacted Anna’s Archive for high-speed access to its data. NVIDIA has benefited from the artificial intelligence boom, with revenue surging due to high demand for its AI-learning chips and data center services.

NVIDIA develops its own AI models, including NeMo, Retro-48B, InstructRetro, and Megatron. These models are trained using NVIDIA hardware and large text libraries, similar to practices at other technology companies. The company has faced legal challenges from copyright holders regarding its training methodologies.

Authors first sued NVIDIA in early 2024 for copyright infringement, claiming the company’s AI models were trained on the Books3 dataset, which included copyrighted works from Bibliotik without permission. NVIDIA defended its actions as fair use, stating that books are statistical correlations to its AI models. However, new evidence emerged during discovery.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

Plaintiffs filed an amended complaint last Friday, expanding the lawsuit’s scope by adding more books, authors, and AI models. The amended complaint includes broader “shadow library” claims. Authors, including Abdi Nazemian, now cite internal NVIDIA emails and documents, alleging the company willingly downloaded millions of copyrighted books. The complaint claims “competitive pressures drove NVIDIA to piracy,” involving collaboration with Anna’s Archive.

According to the amended complaint, a member of NVIDIA’s data strategy team contacted Anna’s Archive to inquire about acquiring its pirated materials for pre-training large language models, including Anna’s Archive. The complaint states Anna’s Archive charged tens of thousands of dollars for “high-speed access” to its collections, and NVIDIA sought details on this access.

The complaint alleges Anna’s Archive warned NVIDIA that its library content was illegally acquired and maintained. Anna’s Archive reportedly asked NVIDIA executives for internal permission to proceed, which was granted within a week. After receiving permission from NVIDIA management, Anna’s Archive provided access to its pirated books. Anna’s Archive offered NVIDIA access to approximately 500 terabytes of data, including millions of books typically available through Internet Archive’s digital lending system. The complaint does not specify if NVIDIA paid Anna’s Archive. NVIDIA also faces accusations of using other pirated sources, including LibGen, Sci-Hub, and Z-Library, in addition to the Books3 database.

Authors allege NVIDIA not only downloaded and used pirated books for its AI training but also distributed scripts and tools enabling corporate customers to download “The Pile,” which contains the Books3 pirated dataset. These allegations introduce new claims of vicarious and contributory infringement, asserting NVIDIA generated revenue from customers by facilitating access to these pirated datasets. The authors seek compensation for damages for named authors and potentially hundreds of others joining the class-action lawsuit.

This revelation marks the first public disclosure of correspondence between a major U.S. tech company and Anna’s Archive. The first consolidated and amended complaint, filed at the U.S. District Court for the Northern District of California, names authors Abdi Nazemian, Brian Keene, Stewart O’Nan, Andre Dubus III, and Susan Orlean.


Featured image credit

Tags: AI trainingFeaturedNvidia

Related Posts

OpenAI improves health responses for free ChatGPT users

OpenAI improves health responses for free ChatGPT users

June 19, 2026
Adobe expands Firefly AI across Premiere, Illustrator, InDesign and Frame.io

Adobe expands Firefly AI across Premiere, Illustrator, InDesign and Frame.io

June 19, 2026
Spotify launches Reserved to give superfans early ticket access

Spotify launches Reserved to give superfans early ticket access

June 19, 2026
Google discontinues Nest Home Mini and Nest Audio

Google discontinues Nest Home Mini and Nest Audio

June 19, 2026
Instagram adds unique captions for each carousel slide

Instagram adds unique captions for each carousel slide

June 19, 2026
Steam Next Fest sees one in five demos labeled for generative AI

Steam Next Fest sees one in five demos labeled for generative AI

June 17, 2026

LATEST NEWS

OpenAI improves health responses for free ChatGPT users

Adobe expands Firefly AI across Premiere, Illustrator, InDesign and Frame.io

Spotify launches Reserved to give superfans early ticket access

Google discontinues Nest Home Mini and Nest Audio

Instagram adds unique captions for each carousel slide

Steam Next Fest sees one in five demos labeled for generative AI

BEST AI MODELS LEADERBOARD

See the best AI models, ranked by intelligence, benchmark results, speed and token price. Find the most suitable LLMs, Text-to-Image, Image Editing, Text-to-Speech, Text-to-Video and Image-to-Video  artificial intelligence model for your tasks and business.

LATEST TOOLS

Novoresume

PolyAI

SeaArt

H2O.ai

Techpresso

Namecheap Free Logo Maker

Binaural Beats Factory

Lyricallabs

Jobscan

Vsub

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies to improve your experience. You can choose to accept or reject them. Visit our Privacy Policy.