The Internet Archive has lost a significant legal battle after the US Court of Appeals upheld a ruling in Hachette v. Internet Archive, stating that its book digitization and lending practices violated copyright law. The case stemmed from the Archive’s National Emergency Library initiative during the pandemic, which allowed unrestricted digital lending of books, sparking backlash from publishers and authors. The court rejected the Archive’s fair use defense, although it acknowledged its nonprofit status. This ruling strengthens authors’ and publishers’ control over their works. But it immediately reminds me of how AI tools train and use data on the Internet, including books and more. If the nonprofit Internet Archive’s work is not fair use, how do the paid AI tools use this data?
Despite numerous AI copyright lawsuits, text-based data from news outlets usually doesn’t result in harsh rulings against AI tools, often ending in partnerships with major players.
You might think it’s different and argue that the Internet Archive directly uses books, but even though AI tools rely on all the data they have to generate your essay, you can still get specific excerpts or more detailed responses from them if you use a well-crafted prompt.
The Hachette v. Internet Archive case highlights significant concerns about how AI models acquire training data, especially when it involves copyrighted materials like books. AI systems often rely on large datasets, including copyrighted texts, raising similar legal challenges regarding unlicensed use. If courts restrict the digitization and use of copyrighted works without permission, AI companies may need to secure licenses for the texts used in training, adding complexity and potential costs. This could limit access to diverse, high-quality datasets, ultimately affecting AI development and innovation.
Additionally, the case underlines the limitations of the fair use defense in the context of transformative use, which is often central to AI’s justification for using large-scale text data. If courts narrowly view what constitutes fair use, AI developers might face more restrictions on how they access and use copyrighted books. This tension between protecting authors’ rights and maintaining open access to knowledge could have far-reaching consequences for the future of AI training practices and the ethical use of data.
Need a deeper dive into the case? Here is everything you need to know about it.
Hachette v. Internet Archive explained
Hachette v. Internet Archive is a significant legal case that centers around copyright law and the limits of the “fair use” doctrine in the context of digital libraries. The case began in 2020, when several large publishing companies—Hachette, HarperCollins, Penguin Random House, and Wiley—sued the Internet Archive, a nonprofit organization dedicated to preserving digital copies of websites, books, and other media.
The case focused on the Archive’s practice of scanning books and lending them out online.
The story behind the Internet Archive lawsuit
The Open Library project, run by the Internet Archive, was set up to let people borrow books digitally. Here’s how it worked:
- The Internet Archive bought physical copies of books.
- They scanned these books into digital form.
- People could borrow a digital version, but only one person at a time could check out a book, just like borrowing a physical book from a regular library.
The Internet Archive thought this was legal because they only let one person borrow a book at a time. They called this system Controlled Digital Lending (CDL). The idea was to make digital lending work just like physical library lending.
When the COVID-19 pandemic hit in early 2020, many libraries had to close, making it hard for people to access books. To help, the Internet Archive launched the National Emergency Library (NEL) in March 2020. This program changed things:
- The NEL allowed multiple people to borrow the same digital copy of a book at the same time. This removed the one-person-at-a-time rule.
- The goal was to give more people access to books during the pandemic, especially students and researchers who were stuck at home.
While the NEL was meant to be temporary, it upset authors and publishers. They argued that letting many people borrow the same digital copy without permission was like stealing their work.
Publishers’ riot
In June 2020, the big publishers sued the Internet Archive. They claimed:
- The Internet Archive did not have permission to scan their books or lend them out digitally.
- By doing this, the Internet Archive was violating their copyright, which gives them the exclusive right to control how their books are copied and shared.
- The NEL’s approach, which let many people borrow digital copies at once, was especially harmful to their business and was essentially piracy.
The publishers argued that the Internet Archive’s actions hurt the market for their books. They said people were getting free digital versions instead of buying ebooks or borrowing from licensed libraries.
Internet Archive’s defense
The Internet Archive defended itself by claiming that its work was protected by fair use. Fair use allows limited use of copyrighted material without permission for purposes like education, research, and commentary. The Archive made these points:
- They were providing a transformative service by giving readers access to physical books in a new, digital form.
- They weren’t making a profit from this, as they’re a nonprofit organization with the mission of preserving knowledge and making it accessible.
- The NEL was a temporary response to the pandemic, and they were trying to help people who couldn’t access books during the crisis.
They also pointed to their Controlled Digital Lending system as a way to respect copyright laws. Under CDL, only one person could borrow a book at a time, just like in a physical library.
The court’s decisions
District Court Ruling (March 2023)
In March 2023, a federal court sided with the publishers. Judge John G. Koeltl ruled that the Internet Archive’s actions were not protected by fair use. He said:
- The Internet Archive’s digital lending was not transformative because they weren’t adding anything new to the books. They were simply copying them in digital form, which wasn’t enough to qualify for fair use.
- The court also found that the Archive’s lending hurt the market for both printed and digital versions of the books. By offering free digital copies, the Internet Archive was seen as competing with publishers’ ebook sales.
- The court concluded that the Archive had created derivative works, which means they made new versions of the books (digital copies) without permission.
Appeals Court Ruling (August 2023)
The Internet Archive appealed the decision to a higher court, the US Court of Appeals for the Second Circuit, hoping to overturn the ruling. However, the appeals court also ruled in favor of the publishers but made one important clarification:
- The court recognized that the Internet Archive is a nonprofit organization and not a commercial one. This distinction was important because commercial use can often weaken a fair use defense, but in this case, the court acknowledged that the Archive wasn’t motivated by profit.
- Despite that, the court still agreed that the Archive’s actions weren’t protected by fair use, even though it’s a nonprofit.
Bottom line
The Hachette v. Internet Archive case has shown that even nonprofits like the Internet Archive can’t freely digitize and lend books without violating copyright laws. This ruling could also affect how AI companies use copyrighted materials to train their systems. If nonprofits face such restrictions, AI tools might need to get licenses for the data they use. Even if they have already started to make some deals, I wonder, what about the first entries?
Featured image credit: Eray Eliaçık/Bing