Did AI companies win a fight with authors? Technically
June 28, 2025
In the past week, big AI companies have â in theory â chalked up two big legal wins. But things are not quite as straightforward as they may seem, and copyright law hasnât been this exciting since last monthâs showdown at the Library of Congress.
First, Judge William Alsup ruled it was fair use for Anthropic to train on a series of authorsâ books. Then, Judge Vince Chhabria dismissed another group of authorsâ complaint against Meta for training on their books. Yet far from settling the legal conundrums around modern AI, these rulings might have just made things even more complicated.
Both cases are indeed qualified victories for Meta and Anthropic. And at least one judge â Alsup â seems sympathetic to some of the AI industryâs core arguments about copyright. But that same ruling railed against the startupâs use of pirated media, leaving it potentially on the hook for massive financial damage. (Anthropic even admitted it did not initially purchase a copy of every book it used.) Meanwhile, the Meta ruling asserted that because a flood of AI content could crowd out human artists, the entire field of AI system training might be fundamentally at odds with fair use. And neither case addressed one of the biggest questions about generative AI: when does its output infringe copyright, and whoâs on the hook if it does?
Alsup and Chhabria (incidentally both in the Northern District of California) were ruling on relatively similar sets of facts. Meta and Anthropic both pirated huge collections of copyright-protected books to build a training dataset for their large language models Llama and Claude. Anthropic later did an about-face and started legally purchasing books, tearing the covers off to âdestroyâ the original copy, and scanning the text.
The authors argued that, in addition to the initial piracy, the training process constituted an unlawful and unauthorized use of their work. Meta and Anthropic countered that this database-building and LLM-training constituted fair use.
Both judges basically agreed that LLMs meet one central requirement for fair use: they transform the source material into something new. Alsup called using books to train Claude âexceedingly transformative,â and Chhabria concluded âthereâs no disputingâ the transformative value of Llama. Another big consideration for fair use is the new workâs impact on a market for the old one. Both judges also agreed that based on the arguments made by the authors, the impact wasnât serious enough to tip the scale.
Add those things together, and the conclusions were obvious⊠but only in the context of these cases, and in Metaâs case, because the authors pushed a legal strategy that their judge found totally inept.
Put it this way: when a judge says his ruling âdoes not stand for the proposition that Metaâs use of copyrighted materials to train its language models is lawfulâ and âstands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right oneâ â as Chhabria did â AI companiesâ prospects in future lawsuits with him donât look great.
Both rulings dealt specifically with training â or media getting fed into the models â and didnât reach the question of LLM output, or the stuff models produce in response to user prompts. But output is, in fact, extremely pertinent. A huge legal fight between The New York Times and OpenAI began partly with a claim that ChatGPT could verbatim regurgitate large sections of Times stories. Disney recently sued Midjourney on the premise that it âwill generate, publicly display, and distribute videos featuring Disneyâs and Universalâs copyrighted charactersâ with a newly launched video tool. Even in pending cases that werenât output-focused, plaintiffs can adapt their strategies if they now think itâs a better bet.
The authors in the Anthropic case didnât allege Claude was producing directly infringing output. The authors in the Meta case argued Llama was, but they failed to convince the judge â who found it wouldnât spit out more than around 50 words of any given work. As Alsup noted, dealing purely with inputs changed the calculations dramatically. âIf the outputs seen by users had been infringing, Authors would have a different case,â wrote Alsup. âAnd, if the outputs were ever to become infringing, Authors could bring such a case. But that is not this case.â
In their current form, major generative AI products are basically useless without output. And we donât have a good picture of the law around it, especially because fair use is an idiosyncratic, case-by-case defense that can apply differently to mediums like music, visual art, and text. Anthropic being able to scan authorsâ books tells us very little about whether Midjourney can legally help people produce Minions memes.
Minions and New York Times articles are both examples of direct copying in output. But Chhabriaâs ruling is particularly interesting because it makes the output question much, much broader. Though he may have ruled in favor of Meta, Chhabriaâs entire opening argues that AI systems are so damaging to artists and writers that their harm outweighs any possible transformative value â basically, because theyâre spam machines.
Itâs worth reading:
Generative AI has the potential to flood the market with endless amounts of images, songs, articles, books, and more. People can prompt generative AI models to produce these outputs using a tiny fraction of the time and creativity that would otherwise be required. So by training generative AI models with copyrighted works, companies are creating something that often will dramatically undermine the market for those works, and thus dramatically undermine the incentive for human beings to create things the old-fashioned way.
âŠ
As the Supreme Court has emphasized, the fair use inquiry is highly fact dependent, and there are few bright-line rules. There is certainly no rule that when your use of a protected work is âtransformative,â this automatically inoculates you from a claim of copyright infringement. And here, copying the protected works, however transformative, involves the creation of a product with the ability to severely harm the market for the works being copied, and thus severely undermine the incentive for human beings to create.
âŠ
The upshot is that in many circumstances it will be illegal to copy copyright-protected works to train generative AI models without permission. Which means that the companies, to avoid liability for copyright infringement, will generally need to pay copyright holders for the right to use their materials.
And boy, it sure would be interesting if somebody would sue and make that case. After saying that âin the grand scheme of things, the consequences of this ruling are limited,â Chhabria helpfully noted this ruling affects only 13 authors, not the âcountless othersâ whose work Meta used. A written court opinion is unfortunately incapable of physically conveying a wink and a nod.
Those lawsuits might be far in the future. And Alsup, though he wasnât faced with the kind of argument Chhabria suggested, seemed potentially unsympathetic to it. âAuthorsâ complaint is no different than it would be if they complained that training schoolchildren to write well would result in an explosion of competing works,â he wrote of the authors who sued Anthropic. âThis is not the kind of competitive or creative displacement that concerns the Copyright Act. The Act seeks to advance original works of authorship, not to protect authors against competition.â He was similarly dismissive of the claim that authors were being deprived of licensing fees for training: âsuch a market,â he wrote, âis not one the Copyright Act entitles Authors to exploit.â
But even Alsupâs seemingly positive ruling has a poison pill for AI companies. Training on legally acquired material, he ruled, is classic protected fair use. Training on pirated material is a different story, and Alsup absolutely excoriates any attempt to say itâs not.
âThis order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use,â he wrote. There were plenty of ways to scan or copy legally acquired books (including Anthropicâs own scanning system), but âAnthropic did not do those things â instead it stole the works for its central library by downloading them from pirated libraries.â Eventually switching to book scanning doesnât erase the original sin, and in some ways it actually compounds it, because it demonstrates Anthropic could have done things legally from the start.
If new AI companies adopt this perspective, theyâll have to build in extra but not necessarily ruinous startup costs. Thereâs the up-front price of buying what Anthropic at one point described as âall the books in the world,â plus any media needed for things like images or video. And in Anthropicâs case these were physical works, because hard copies of media dodge the kinds of DRM and licensing agreements publishers can put on digital ones â so add some extra cost for the labor of scanning them in.
But just about any big AI player currently operating is either known or suspected to have trained on illegally downloaded books and other media. Anthropic and the authors will be going to trial to hash out the direct piracy accusations, and depending on what happens, a lot of companies could be hypothetically at risk of almost inestimable financial damages â not just from authors, but from anyone that demonstrates their work was illegally acquired. As legal expert Blake Reid vividly puts it, âif thereâs evidence that an engineer was torrenting a bunch of stuff with C-suite blessing it turns the company into a money piñata.â
And on top of all that, the many unsettled details can make it easy to miss the bigger mystery: how this legal wrangling will affect both the AI industry and the arts.
Echoing a common argument among AI proponents, former Meta executive Nick Clegg said recently that getting artistsâ permission for training data would âbasically kill the AI industry.â Thatâs an extreme claim, and given all the licensing deals companies are already striking (including with Vox Media, the parent company of The Verge), itâs looking increasingly dubious. Even if theyâre faced with piracy penalties thanks to Alsupâs ruling, the biggest AI companies have billions of dollars in investment â they can weather a lot. But smaller, particularly open source players might be much more vulnerable, and many of them are also almost certainly trained on pirated works.
Meanwhile, if Chhabriaâs theory is right, artists could reap a reward for providing training data to AI giants. But itâs highly unlikely the fees would shut these services down. That would still leave us in a spam-filled landscape with no room for future artists.
Can money in the pockets of this generationâs artists compensate for the blighting of the next? Is copyright law the right tool to protect the future? And what role should the courts be playing in all this? These two rulings handed partial wins to the AI industry, but they leave many more, much bigger questions unanswered.
Â
Search
RECENT PRESS RELEASES
Related Post