Big Tech Wants to Take Your Work to Feed Its Bots. These Lawsuits Might Let Them.
June 30, 2025
Sign up for the Slatest to get the most insightful analysis, criticism, and advice out there, delivered to your inbox daily.
Last week, two different federal judges in the Northern District of California made legal rulings that attempt to resolve one of the knottiest debates in the artificial intelligence world: whether it’s a copyright violation for Big Tech firms to use published books for training generative bots like ChatGPT. Unfortunately for the many authors who’ve brought lawsuits with this argument, neither decision favors their case—at least, not for now. And that means creators in all fields may not be able to stop A.I. companies from using their work however they please.
On Tuesday, a U.S. district judge ruled that Amazon-backed startup Anthropic did not violate copyright law when it used the works of three authors to train the company’s flagship chatbot, Claude. In Bartz v. Anthropic, writers Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson claimed that the A.I. firm had infringed upon their copyright protections when multiple books of theirs had not only been used to train Claude, but had been pirated illegally for said purpose. Anthropic’s counter was that all of its practices—from the training itself to the utilization of books its engineers had alternatively pirated and purchased for Claude training—constituted an instance of fair use and were perfectly legal. U.S. District Judge William Alsup agreed in part with Anthropic, ruling that the training itself did not violate copyright law, but that the piracy certainly did. A new trial is set to rule on the damages from Anthropic’s downloads of ill-gotten books.
The second judgment landed just a day later, concerning a case that prominent authors like Sarah Silverman, Ta-Nehisi Coates, and Richard Kadrey had brought against Meta on similar grounds, albeit more limited in scope. They merely argued, in a bid for summary judgment in Kadrey v. Meta, that automatic A.I. training with copyrighted works undercuts their ability to negotiate other deals, and that Meta’s Llama sets are “capable of reproducing small snippets of text from their books.” Judge Vince Chhabria sided with Meta but appeared to do so regretfully, stating that Meta’s use of the writers’ work to train its bots isn’t necessarily legal but that the plaintiffs “made the wrong arguments.” Chhabria went even further, adding, “it’s hard to imagine that it can be fair use to use copyrighted books to develop a tool to make billions or trillions of dollars while enabling the creation of a potentially endless stream of competing works.” However, instead of presenting that “potentially winning argument,” the authors had approached the bench with “clear losers.” Meta’s A.I. did not reproduce enough text from the authors’ books to constitute any sort of plagiarism or piracy. The judge likewise decreed that these writers “are not entitled to the market for licensing their works as AI training data.” But ultimately, unlike Alsup, Chhabria did appear to leave an open legal pathway for the authors to relitigate their case on different grounds if they so wish.
While we don’t know for sure just yet, it seems likely that the authors will try again. After all, Kadrey, Silverman, and Coates have also been fighting a case against OpenAI, alleging direct copyright infringement and unfair competition in OpenAI’s use of their books to train bots like ChatGPT. Their complaint has already gotten some results, having forced OpenAI to reveal details of its closely guarded training data in court. They may also be encouraged by a significant A.I.-copyright decision from February, when the global media-and-tech organization Thomson Reuters won a fair use lawsuit against an A.I. startup whose large language models ingested and reproduced Thomson Reuters’ legal texts.
Those authors are not the only creatives arguing that the fruits of their labor should not be easy fuel for advanced syntax-prediction machines. Over the past few months, several high- and low-profile court cases have been pitting everyone from authors to powerful media companies to music publishers and eminent news organizations against burgeoning and well-funded A.I. outfits—to varying results. All these legal battles are existential for publishers and writers at a moment when A.I. is already upending long-embattled creative sectors—and when tech executives are doubling down on their insistence that copyright limits should be brushed aside in the arms race for A.I. supremacy. Should they win out, there may be nothing stopping A.I. companies from devastating the industries that allow for humans to exercise their creative expression, and filling the void with knockoff machine generations instead.
In OpenAI CEO Sam Altman’s case, he’s even leveraging his newly chummy relationship with President Donald Trump in the hopes that the federal government will unilaterally declare all A.I. training methods to be permissible as fair use. The Trump administration has already made some favorable moves on behalf of Altman and Co. this month, with DOGE having sacked the head of the Library of Congress and the head of its U.S. Copyright Office, right as the agency was set to publish a report recommending A.I.-training copyright standards that would be more favorable to authors. (The office currently has no one at the helm.) There’s also the fear that Congress will attempt to nullify all state-level A.I. regulations via federal legislation, which would mean that the few laws that are in place to protect creators from A.I.—like Tennessee’s bill against unauthorized deepfakes of notable performers—may soon be all but crushed.
All of which is to say, there is a lot that’s going to be legally murky about A.I. and copyright for a while yet. Judges are going to have to assess the copyright implications of a wide range of media—not just text, but printed text as compared with digital text, along with illustration, video, and music. On top of that, all these federal court rulings are likely to be appealed by either party no matter the result, making it all but inevitable that appellate courts and even the Supreme Court will chime in. (The recent SCOTUS ruling on Trump’s birthright citizenship executive order will make it impossible for the lower courts to effectively pause any A.I.-copyright executive actions from the White House.) However, there are certain indications from the rulings in last week’s Anthropic and Meta cases that offer us a hint as to where the judicial system may ultimately land on the fair use issue.
In the Anthropic case, one reasoning Judge Alsup gave was that the trained data sets and A.I. models that power Claude have sufficient anti-plagiarism filters, which meant that “Claude created no exact copy, nor any substantial knock-off.” (Alsup did allow that “if the outputs were ever to become infringing, Authors could bring such a case.”) What’s more, Anthropic stashed all the books used for training in a permanent internal set—but never handed out those books to others or made them inappropriately public in any way. This “central library” did not apply to fair use, but it did not infringe upon copyright as long as the books were purchased properly. (This is why the judge plans to bring Anthropic to trial over the books it had stolen.)
For the Meta case, Judge Chhabria did not seem to agree with all of Alsup’s points—especially the contention that purchasing books for training indicates sufficient compensation—and he all but wrote a guidebook for his plaintiffs to try again later. Specifically, he stated that the authors should come back with arguments that Meta’s chatbots do produce output that’s strikingly similar to their works, that A.I.’s ability to do so at a rapid pace at scale cuts into the market for their books (especially when it comes to nonfiction and newer fiction), and that Meta’s A.I. achieves all this through the utilization of pirated book copies for training (a fact that was uncovered during this very trial). “In many circumstances it will be illegal to copy copyright-protected works to train generative AI models without permission,” Chhabria declared. “Which means that the companies, to avoid liability for copyright infringement, will generally need to pay copyright holders for the right to use their materials.”
Both judges seem to align on a couple of key points. One, A.I. generations that significantly resemble samples from their training data are not protected by fair use, but filters that prevent chatbots from copying their sources are kosher. “The Anthropic LLM implements filters so that if you have a user who asks for basically an entire work, the LLM is not going to give them that,” said Ray Seilie, an entertainment and tech lawyer who serves as counsel for the law firm KHIKS.
Second, A.I. firms cannot shortcut the training process through piracy of intellectual property. Where they diverge on the second point is how much training itself violates copyright law. We’re likely to see more such disputes on that contention as other cases make their way through the courts.
But for creators worried about how A.I. has appropriated their work, these rulings have offered a strategy. In Disney’s new lawsuit against the image generator Midjourney and the big three record labels’ lawsuits against A.I.-music tech, the plaintiffs specifically attack the respective startups for generating images/songs that easily resemble the copyright works used for training (e.g., Midjourney spitting out a Donald Duck replication, or the app Suno mimicking Bruce Springsteen’s voice). Authors litigating with OpenAI and other text generators can point to how A.I.-generated books have taken over various Amazon bestseller lists, and how in many cases those charted “books” appear as outright clones of original works. These writers and journalists can also leverage arguments that A.I. companies tried to hasten training by mass piracy, that these generative tools are capable of replicating their work at scale and with speed, and that any stowed copyright training material that was leaked in a cyberattack or shared without permission does not fit within the bounds of fair use.
What if these copyright battles are also lost? Then there will be little in the way of stopping A.I. startups from utilizing all creative works for their own purposes, with no consideration as to the artists and writers who actually put in the work. And we will have a world blessed less with human creativity than one overrun by second-rate slop that crushes the careers of the people whose imaginations made that A.I. so potent to begin with.
Search
RECENT PRESS RELEASES
Related Post