The contentious topic of AI copyright lawsuits is gaining traction, with many advocating that it’s high time AI enterprises compensated for the vast amounts of freely sourced data that have bolstered their generative systems.
In a recent wave of legal disputes, a multitude of lawsuits seeking remuneration from AI entities has emerged across the United States and Europe. The litigants range from individual authors and artists to large media conglomerates, all voicing their objections to AI’s appropriation of their creations for generating substandard offshoots.
An impactful open letter from the Authors Guild, bearing over 8,500 signatures from prominent writers such as Margaret Atwood, Dan Brown, and Jodi Picoult, has called upon the creators of generative AI applications, including ChatGPT and Bard, to halt the unauthorized use of literary works and to provide due compensation. These authors demand reparation for the data “harvested” to nourish these AI systems, likening it to an unpaid feast.
Writers also fear the potential for generative AI to undermine their craft by inundating the market with automated content derived from their original works. This concern was highlighted recently when Amazon had to intervene to address the issue of AI-generated books crowding its bestseller charts.
Before the Authors Guild made its appeal, authors Mona Awad and Paul Tremblay initiated legal proceedings against OpenAI. They alleged copyright infringement on the grounds that ChatGPT’s accurate summaries of their books implied the AI had been trained on their copyrighted material. They are not alone in this battle; author and comedian Sarah Silverman has also filed a lawsuit against OpenAI and Meta, accusing them of unauthorized replication of her autobiography, “The Bedwetter.” However, the intricacies of generative AI’s functionality might complicate the legal validity of these claims.
It’s not just individuals who are entering the legal fray. In a landmark move, The New York Times positioned itself as the first major American news outlet to bring a lawsuit against OpenAI, challenging the use of copyrighted material in the training and development of AI.
AI copyright lawsuits: The reason behind
The burgeoning phenomenon of AI copyright lawsuits is emblematic of a growing resistance to the unchecked use of copyrighted content by AI companies. While platforms like ChatGPT have been developed using internet-sourced data, they have done so without explicit consent from the creators of that data. Notably, GPT-3’s training encompassed a plethora of sources, including Wikipedia and Reddit. This process may inadvertently incorporate segments of copyrighted materials, enabling these expansive language models to concisely summarize copyrighted works with a disconcerting level of accuracy.
The issue magnifies when considering the enigmatic nature of AI. The “black box” dilemma, where the inner workings of AI remain obscured, exacerbates fears that AI could become a scapegoat for shirking accountability in both decision-making and content generation.
The legal contention also arises from concerns that if AI corporations continue to commercialize these opaque systems, these AI models could emerge as the quintessential means to an end. The danger lies in a potential future where decisions are not entrusted to AI systems for their efficacy or accuracy but because they can circumvent the legal and ethical constraints that bind human actions.
Data sources and methods
AI development, particularly with generative AI models like those at the center of numerous lawsuits, the process of data collection is a crucial and contentious aspect. The methods and sources from which these AI systems derive their training data have significant legal and ethical implications, especially when it involves copyrighted material.
Generative AI models, such as GPT-3 or ChatGPT, are trained on vast datasets collected from various online sources. These sources often include public websites like Wikipedia and Reddit, but can also encompass more contentious repositories like shadow libraries or other platforms where copyrighted materials are readily available. The training involves not just simple data scraping but also complex processes to understand context, style, and content nuances.
The legal gray area
The legal ambiguity arises from the fact that while the data is publicly accessible, the usage rights are not always clear. For instance, content from a public forum may not explicitly prohibit its use for training AI, but neither does it grant permission. This gray area has led to numerous AI copyright lawsuits, where plaintiffs argue that their intellectual property rights have been violated by the inclusion of their work in AI training sets without consent or compensation.
How AI is violating human rights?
AI technologies, while revolutionary, are increasingly scrutinized for potential human rights violations, a concern accentuated in the context of AI copyright lawsuits.
Key issues include:
- AI’s capability for extensive data collection and surveillance can infringe on individual privacy rights.
- AI systems can perpetuate biases present in their training data, leading to discriminatory outcomes in various sectors, underlining concerns in ongoing AI copyright lawsuits.
- AI-driven content moderation may inadvertently suppress free speech, an issue that intersects with the intellectual property debates in AI copyright lawsuits.
- In legal settings, AI tools can influence decision-making, potentially impacting the fairness of trials and judicial processes.
- AI-driven automation poses challenges to workers’ rights due to job displacement and the need for workforce adaptation.
- AI’s uneven access and impact can exacerbate existing inequalities, a concern that parallels the equitable access and usage rights at the heart of AI copyright lawsuits.
- AI systems that manipulate user behavior raise questions about individual autonomy and consent.
- AI’s control over information dissemination can affect the public’s right to access diverse and unbiased information.
Artists’ case for copyrights against AI faces uphill battle
What are the lawsuits against AI?
The legal arena is currently teeming with AI copyright lawsuits, with several cases spotlighting the tension between generative AI enterprises and copyright norms. The litigants include a variety of companies ensnared in these high-stakes legal battles.
Google: Data collection lawsuit
Google is facing a class-action suit accusing the tech giant of personal information misuse and copyright infringement. Allegations detail that Google harvested data, including images from dating sites, Spotify playlists, TikTok videos, and literature used to refine Bard. Launched in July 2023, the claim suggests Google might be liable for damages upwards of $5 billion. Opting for anonymity, the plaintiffs represent a growing concern over privacy and proprietary rights.
This spate of AI copyright lawsuits lawsuits is not without precedent. The Author’s Guild’s 2015 case against Google set a significant legal benchmark. The Guild challenged Google’s digitization of millions of books, offering snippets online. The ruling favored Google, characterizing the use as transformative and non-competitive with the original market for the books.
OpenAI: Copyright issues
OpenAI has also entered the legal fray, with authors Paul Tremblay and Mona Awad alleging copyright infringement. Their attorney, Butterick, represents a broader cohort of authors whose works, they claim, have been replicated within OpenAI’s extensive training data, potentially numbering over 300,000 books. Filed in June 2023, the lawsuit demands an undisclosed sum in damages.
OpenAI and Microsoft: NYT lawsuit
Additionally, The New York Times has launched a lawsuit against both OpenAI and Microsoft. The December 2023 filing contends that OpenAI utilized millions of Times articles to train their language models, which now rival the publication in delivering reliable information. Moreover, the lawsuit asserts that OpenAI’s models not only echo the unique stylistic flair of the Times but also recite its content verbatim. The Times, marking a first for a major American news outlet, pursued discussions regarding the copyright issue earlier in the year, but to no avail, culminating in this landmark litigation.
Meta and OpenAI: The Silverman case
Comedian Sarah Silverman’s legal action against Meta and OpenAI brings to light allegations of copyright infringement, positing that both ChatGPT and Meta AI’s Large Language Model (Llama) were developed using unlawfully sourced data that included her work. The lawsuit points to “shadow libraries” like Library Genesis, Z-Library, and Bibliotek, notorious for torrent-based content sharing, which often occurs without legal authorization. Specifically, the case notes that Meta’s Llama was informed by a dataset known as the Pile, compiled by EleutherAI, which purportedly contains data from Bibliotek. This suit was initiated in July 2023.
GitHub, Microsoft, and OpenAI: The Copilot controversy
A collective AI copyright lawsuits lawsuit targets GitHub, Microsoft, and OpenAI concerning the Copilot tool. This AI-powered service autocompletes code snippets by learning from a programmer’s input. The plaintiffs argue that Copilot unlawfully regurgitates code from GitHub’s repositories, disregarding the licensing requirements, including proper attribution. Beyond copyright complaints, the suit also accuses GitHub of personal data mismanagement and fraud. Filed in November 2022, the case has seen repeated dismissal attempts by Microsoft and GitHub.
Stability AI, Midjourney, and DeviantArt: The artistic integrity dispute
January 2023 saw a lawsuit against AI image generator companies Stability AI, Midjourney, and DeviantArt. Plaintiffs claim that these platforms infringe upon copyrights by training on and generating derivatives of the plaintiffs’ works. Additionally, there’s contention over the ability of these tools to replicate the styles of specific artists. The presiding judge, William Orrick, expressed a preliminary intention to dismiss the complaint.
Stability AI: The Getty Images lawsuits
Getty Images’ dual lawsuits against Stability AI spotlight the unauthorized copying and processing of countless images and associated metadata that Getty holds rights to in the U.K. A subsequent lawsuit in the U.S. District Court for the District of Delaware echoes similar copyright and trademark violations. It also emphasizes the concern over “bizarre or grotesque” images generated with the Getty watermark, potentially tarnishing the esteemed image repository’s reputation. These legal moves were made in January 2023.
Key questions raised by these AI copyright lawsuits
The emergence of AI copyright lawsuits signals a shift in how we view digital creativity. These high-profile legal confrontations raise several key questions that could redefine the copyright law in relation to generative AI:
- Licensing for AI training materials: Is there a necessity for licensing when AI models are trained on copyrighted content? Given that generative AI systems replicate the training materials during their learning phase, the legal debate hinges on whether this replication falls under fair use or requires formal licensing.
- Copyright infringement and AI outputs: Do the results produced by generative AI infringe on the copyrights of the materials used in training? A key aspect for the courts to determine is whether the similarities between AI outputs and the training data are based on protected or non-protected content. Additionally, the question of who bears responsibility for any copyright infringement committed by an AI system is yet to be resolved.
- Compliance with digital copyright laws: Are generative AI technologies in breach of laws that govern the alteration or removal of copyright management information? This issue is particularly relevant in the case against Stability AI, where AI-generated images included false copyright management information, like reproduced watermarks.
- Right of publicity and AI: Does creating AI-generated works that mimic the style of a specific individual infringe on their right of publicity? This right, which differs across states, restricts the use of an individual’s likeness, name, image, voice, or signature for commercial purposes without consent.
- Open source licenses and AI: How do open source licenses intersect with the training and distribution of AI-generated content? This is a central concern in the GitHub Copilot lawsuit, where plaintiffs argue that the failure to attribute the source material and to release Copilot as open source violates the terms of open source licensing.
As these AI copyright lawsuits progress and begin to offer answers, entities involved in the development and deployment of generative AI tools should be attentive to emerging guidelines at the nexus of AI and intellectual property. It may also be prudent for these companies to consider strategies for mitigating potential risks in this evolving legal terrain. AI copyright lawsuits highlight the need for clear policies on data usage and rights.
Featured image credit: Igor Omilaev/Unsplash