Meta lawsuit poses first big test of AI copyright battle
May 1, 2025
Meta will fight a group of US authors in court on Thursday in one of the first big legal tests of whether tech companies can use copyrighted material to train their powerful artificial intelligence models.
The case, which has been brought by about a dozen authors including Ta-Nehisi Coates and Richard Kadrey, is centred around the $1.4tn social media giant’s use of LibGen, a so-called shadow library of millions of books, academic articles and comics, to train its Llama AI models.
The ruling will have wide-reaching implications in the fierce copyright battle between artists and AI groups and is one of a flurry of lawsuits around the world that allege technology groups are using content without permission.
Microsoft, OpenAI and Anthropic also face similar legal challenges over the data used to train the large language models behind their popular AI chatbots, such as ChatGPT and Claude.
“AI models have been trained on hundreds of thousands if not millions of books, downloaded from well-known pirated sites, this was not accidental,” said Mary Rasenberger, chief executive of the Authors Guild. “Authors should have gotten license fees for that.”
Meta has argued that using copyrighted materials to train LLMs is “fair use” if it is used to develop a transformative technology, even if it is from pirated databases. LibGen hosts much of its content without permission from the rights holders. In legal filings, Meta notes that “use was fair irrespective of its method of acquisition”.
According to the court filings, the US tech giant engaged in early discussions with book publishers exploring options to license material to train its models. The plaintiffs allege that Meta abandoned this because the works were available through LibGen, leading to a loss of compensation and control for authors.
In the discovery, Meta said, “if we license once [sic] single book, we won’t be able to lean into the fair use strategy”. Meta argues in its defence that there was no market for licensing such works for this purpose.
However, emails unearthed in the court’s discovery process show Meta employees suggesting they were entering a legal grey area and appearing to discuss how to avoid scrutiny when using LibGen, according to the claim documents.
In one email from January last year, Joelle Pineau, Meta’s recently departed head of AI research lab FAIR, recommended using the LibGen data set.
In a subsequent email, Sony Theakanath, a director of product at Meta, said “in no case would we disclose publicly that we had trained on libgen”. The email had a subtitle “legal risk”, in which the risks or details below it have been redacted, as well as another subtitle “policy risks”, which contained “copyright and IP”. The email suggested mitigations such as “remove data clearly marked as pirated/stolen”.
The case comes as Meta is pouring billions of dollars to become an “AI leader”, developing its Llama models to compete against OpenAI, Microsoft, Google and Elon Musk’s xAI.
“There is a tremendous amount of uncertainty right now,” said Chris Mammen, a partner at law firm Womble Bond Dickinson, highlighting that copyright cases can take years to reach a conclusion.
“It is extremely important to get these things resolved. Things are going to continue happening in the world at the breakneck pace that technology and our economy are developing,” he added.
Another contention in the lawsuit involves the method that plaintiffs allege Meta used to acquire the LibGen database, known as torrenting, which often uploads the content to others using the software while downloading the materials.
It is stated in the court documents that Meta torrented the work but attempted to limit its distribution. However, it has yet to provide assurances that this was entirely prevented, and some evidence relating to outbound data was deleted, according to information from the discovery process.
“Meta has developed transformational open source AI models that are powering incredible innovation, productivity, and creativity for individuals and companies. Fair use of copyrighted materials is vital to this,” Meta said in a statement. “We disagree with [the] plaintiffs’ assertions, and the full record tells a different story. We will continue to vigorously defend ourselves and to protect the development of GenAI for the benefit of all.”
Search
RECENT PRESS RELEASES
Related Post