Did AI companies win a fight with authors? Technically

June 28, 2025

Analysis

Meta and Anthropic defended AI training as fair use, but with major caveats.

Jun 28, 2025, 12:30 PM UTC

Analysis

Meta and Anthropic defended AI training as fair use, but with major caveats.

Jun 28, 2025, 12:30 PM UTC

In the past week, big AI companies have — in theory — chalked up two big legal wins. But things are not quite as straightforward as they may seem, and copyright law hasn’t been this exciting since last month’s showdown at the Library of Congress.

First, Judge William Alsup ruled it was fair use for Anthropic to train on a series of authors’ books. Then, Judge Vince Chhabria dismissed another group of authors’ complaint against Meta for training on their books. Yet far from settling the legal conundrums around modern AI, these rulings might have just made things even more complicated.

Both cases are indeed qualified victories for Meta and Anthropic. And at least one judge — Alsup — seems sympathetic to some of the AI industry’s core arguments about copyright. But that same ruling railed against the startup’s use of pirated media, leaving it potentially on the hook for massive financial damage. (Anthropic even admitted it did not initially purchase a copy of every book it used.) Meanwhile, the Meta ruling asserted that because a flood of AI content could crowd out human artists, the entire field of AI system training might be fundamentally at odds with fair use. And neither case addressed one of the biggest questions about generative AI: when does its output infringe copyright, and who’s on the hook if it does?

Alsup and Chhabria (incidentally both in the Northern District of California) were ruling on relatively similar sets of facts. Meta and Anthropic both pirated huge collections of copyright-protected books to build a training dataset for their large language models Llama and Claude. Anthropic later did an about-face and started legally purchasing books, tearing the covers off to “destroy” the original copy, and scanning the text.

The authors argued that, in addition to the initial piracy, the training process constituted an unlawful and unauthorized use of their work. Meta and Anthropic countered that this database-building and LLM-training constituted fair use.

Both judges basically agreed that LLMs meet one central requirement for fair use: they transform the source material into something new. Alsup called using books to train Claude “exceedingly transformative,” and Chhabria concluded “there’s no disputing” the transformative value of Llama. Another big consideration for fair use is the new work’s impact on a market for the old one. Both judges also agreed that based on the arguments made by the authors, the impact wasn’t serious enough to tip the scale.

Add those things together, and the conclusions were obvious… but only in the context of these cases, and in Meta’s case, because the authors pushed a legal strategy that their judge found totally inept.

Put it this way: when a judge says his ruling “does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful” and “stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one” — as Chhabria did — AI companies’ prospects in future lawsuits with him don’t look great.

Both rulings dealt specifically with training — or media getting fed into the models — and didn’t reach the question of LLM output, or the stuff models produce in response to user prompts. But output is, in fact, extremely pertinent. A huge legal fight between The New York Times and OpenAI began partly with a claim that ChatGPT could verbatim regurgitate large sections of Times stories. Disney recently sued Midjourney on the premise that it “will generate, publicly display, and distribute videos featuring Disney’s and Universal’s copyrighted characters” with a newly launched video tool. Even in pending cases that weren’t output-focused, plaintiffs can adapt their strategies if they now think it’s a better bet.

The authors in the Anthropic case didn’t allege Claude was producing directly infringing output. The authors in the Meta case argued Llama was, but they failed to convince the judge — who found it wouldn’t spit out more than around 50 words of any given work. As Alsup noted, dealing purely with inputs changed the calculations dramatically. “If the outputs seen by users had been infringing, Authors would have a different case,” wrote Alsup. “And, if the outputs were ever to become infringing, Authors could bring such a case. But that is not this case.”

In their current form, major generative AI products are basically useless without output. And we don’t have a good picture of the law around it, especially because fair use is an idiosyncratic, case-by-case defense that can apply differently to mediums like music, visual art, and text. Anthropic being able to scan authors’ books tells us very little about whether Midjourney can legally help people produce Minions memes.

Minions and New York Times articles are both examples of direct copying in output. But Chhabria’s ruling is particularly interesting because it makes the output question much, much broader. Though he may have ruled in favor of Meta, Chhabria’s entire opening argues that AI systems are so damaging to artists and writers that their harm outweighs any possible transformative value — basically, because they’re spam machines.

It’s worth reading:

Generative AI has the potential to flood the market with endless amounts of images, songs, articles, books, and more. People can prompt generative AI models to produce these outputs using a tiny fraction of the time and creativity that would otherwise be required. So by training generative AI models with copyrighted works, companies are creating something that often will dramatically undermine the market for those works, and thus dramatically undermine the incentive for human beings to create things the old-fashioned way.

…

As the Supreme Court has emphasized, the fair use inquiry is highly fact dependent, and there are few bright-line rules. There is certainly no rule that when your use of a protected work is “transformative,” this automatically inoculates you from a claim of copyright infringement. And here, copying the protected works, however transformative, involves the creation of a product with the ability to severely harm the market for the works being copied, and thus severely undermine the incentive for human beings to create.

…

The upshot is that in many circumstances it will be illegal to copy copyright-protected works to train generative AI models without permission. Which means that the companies, to avoid liability for copyright infringement, will generally need to pay copyright holders for the right to use their materials.

And boy, it sure would be interesting if somebody would sue and make that case. After saying that “in the grand scheme of things, the consequences of this ruling are limited,” Chhabria helpfully noted this ruling affects only 13 authors, not the “countless others” whose work Meta used. A written court opinion is unfortunately incapable of physically conveying a wink and a nod.

Those lawsuits might be far in the future. And Alsup, though he wasn’t faced with the kind of argument Chhabria suggested, seemed potentially unsympathetic to it. “Authors’ complaint is no different than it would be if they complained that training schoolchildren to write well would result in an explosion of competing works,” he wrote of the authors who sued Anthropic. “This is not the kind of competitive or creative displacement that concerns the Copyright Act. The Act seeks to advance original works of authorship, not to protect authors against competition.” He was similarly dismissive of the claim that authors were being deprived of licensing fees for training: “such a market,” he wrote, “is not one the Copyright Act entitles Authors to exploit.”

But even Alsup’s seemingly positive ruling has a poison pill for AI companies. Training on legally acquired material, he ruled, is classic protected fair use. Training on pirated material is a different story, and Alsup absolutely excoriates any attempt to say it’s not.

“This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use,” he wrote. There were plenty of ways to scan or copy legally acquired books (including Anthropic’s own scanning system), but “Anthropic did not do those things — instead it stole the works for its central library by downloading them from pirated libraries.” Eventually switching to book scanning doesn’t erase the original sin, and in some ways it actually compounds it, because it demonstrates Anthropic could have done things legally from the start.

If new AI companies adopt this perspective, they’ll have to build in extra but not necessarily ruinous startup costs. There’s the up-front price of buying what Anthropic at one point described as “all the books in the world,” plus any media needed for things like images or video. And in Anthropic’s case these were physical works, because hard copies of media dodge the kinds of DRM and licensing agreements publishers can put on digital ones — so add some extra cost for the labor of scanning them in.

But just about any big AI player currently operating is either known or suspected to have trained on illegally downloaded books and other media. Anthropic and the authors will be going to trial to hash out the direct piracy accusations, and depending on what happens, a lot of companies could be hypothetically at risk of almost inestimable financial damages — not just from authors, but from anyone that demonstrates their work was illegally acquired. As legal expert Blake Reid vividly puts it, “if there’s evidence that an engineer was torrenting a bunch of stuff with C-suite blessing it turns the company into a money piñata.”

And on top of all that, the many unsettled details can make it easy to miss the bigger mystery: how this legal wrangling will affect both the AI industry and the arts.

Echoing a common argument among AI proponents, former Meta executive Nick Clegg said recently that getting artists’ permission for training data would “basically kill the AI industry.” That’s an extreme claim, and given all the licensing deals companies are already striking (including with Vox Media, the parent company of The Verge), it’s looking increasingly dubious. Even if they’re faced with piracy penalties thanks to Alsup’s ruling, the biggest AI companies have billions of dollars in investment — they can weather a lot. But smaller, particularly open source players might be much more vulnerable, and many of them are also almost certainly trained on pirated works.

Meanwhile, if Chhabria’s theory is right, artists could reap a reward for providing training data to AI giants. But it’s highly unlikely the fees would shut these services down. That would still leave us in a spam-filled landscape with no room for future artists.

Can money in the pockets of this generation’s artists compensate for the blighting of the next? Is copyright law the right tool to protect the future? And what role should the courts be playing in all this? These two rulings handed partial wins to the AI industry, but they leave many more, much bigger questions unanswered.

See More:

RECENT PRESS RELEASES

Brazil’s Lula vetoes large parts of environmental ‘devastation bill’

Human connection to nature has declined 60% in 200 years, study finds

Environmental researchers grapple with federal funding changes

Why Donald Trump’s environmental data purge is so much worse this time

2025 ECOS Fall Meeting: Environmental Inspiration in the Land of Enchantment

Trump wants your 401(k) to access crypto and private equity. Here’s what to know.

Related

Related

Installer