Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

Claim: NVIDIA green-lit pirated book downloads for AI training

Internal documents show executives authorized the download of 500 terabytes of pirated data despite warnings of illegal origins.

byKerem Gülen
January 20, 2026
in Artificial Intelligence, News
Home News Artificial Intelligence
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail

NVIDIA executives authorized using millions of pirated books from Anna’s Archive for AI training, according to an expanded class-action lawsuit. The suit, citing internal NVIDIA documents, alleges the company contacted Anna’s Archive for high-speed access to its data. NVIDIA has benefited from the artificial intelligence boom, with revenue surging due to high demand for its AI-learning chips and data center services.

NVIDIA develops its own AI models, including NeMo, Retro-48B, InstructRetro, and Megatron. These models are trained using NVIDIA hardware and large text libraries, similar to practices at other technology companies. The company has faced legal challenges from copyright holders regarding its training methodologies.

Authors first sued NVIDIA in early 2024 for copyright infringement, claiming the company’s AI models were trained on the Books3 dataset, which included copyrighted works from Bibliotik without permission. NVIDIA defended its actions as fair use, stating that books are statistical correlations to its AI models. However, new evidence emerged during discovery.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

Plaintiffs filed an amended complaint last Friday, expanding the lawsuit’s scope by adding more books, authors, and AI models. The amended complaint includes broader “shadow library” claims. Authors, including Abdi Nazemian, now cite internal NVIDIA emails and documents, alleging the company willingly downloaded millions of copyrighted books. The complaint claims “competitive pressures drove NVIDIA to piracy,” involving collaboration with Anna’s Archive.

According to the amended complaint, a member of NVIDIA’s data strategy team contacted Anna’s Archive to inquire about acquiring its pirated materials for pre-training large language models, including Anna’s Archive. The complaint states Anna’s Archive charged tens of thousands of dollars for “high-speed access” to its collections, and NVIDIA sought details on this access.

The complaint alleges Anna’s Archive warned NVIDIA that its library content was illegally acquired and maintained. Anna’s Archive reportedly asked NVIDIA executives for internal permission to proceed, which was granted within a week. After receiving permission from NVIDIA management, Anna’s Archive provided access to its pirated books. Anna’s Archive offered NVIDIA access to approximately 500 terabytes of data, including millions of books typically available through Internet Archive’s digital lending system. The complaint does not specify if NVIDIA paid Anna’s Archive. NVIDIA also faces accusations of using other pirated sources, including LibGen, Sci-Hub, and Z-Library, in addition to the Books3 database.

Authors allege NVIDIA not only downloaded and used pirated books for its AI training but also distributed scripts and tools enabling corporate customers to download “The Pile,” which contains the Books3 pirated dataset. These allegations introduce new claims of vicarious and contributory infringement, asserting NVIDIA generated revenue from customers by facilitating access to these pirated datasets. The authors seek compensation for damages for named authors and potentially hundreds of others joining the class-action lawsuit.

This revelation marks the first public disclosure of correspondence between a major U.S. tech company and Anna’s Archive. The first consolidated and amended complaint, filed at the U.S. District Court for the Northern District of California, names authors Abdi Nazemian, Brian Keene, Stewart O’Nan, Andre Dubus III, and Susan Orlean.


Featured image credit

Tags: AI trainingFeaturedNvidia

Related Posts

Blue Origin sets late February launch for third New Glenn mission

Blue Origin sets late February launch for third New Glenn mission

January 22, 2026
Anthropic overhauls hiring tests due to Claude AI

Anthropic overhauls hiring tests due to Claude AI

January 22, 2026
NexPhone launches triple OS phone for 9

NexPhone launches triple OS phone for $549

January 22, 2026
Google Photos redesigns sharing with immersive full-screen carousel

Google Photos redesigns sharing with immersive full-screen carousel

January 22, 2026
Snap rolls out granular screen time tracking in Family Center update

Snap rolls out granular screen time tracking in Family Center update

January 22, 2026
Spotify launches AI-powered Prompted Playlists

Spotify launches AI-powered Prompted Playlists

January 22, 2026

LATEST NEWS

Blue Origin sets late February launch for third New Glenn mission

Anthropic overhauls hiring tests due to Claude AI

NexPhone launches triple OS phone for $549

Google Photos redesigns sharing with immersive full-screen carousel

Snap rolls out granular screen time tracking in Family Center update

Spotify launches AI-powered Prompted Playlists

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.