Authors Challenge Meta’s Use Of Their Books For Training AI
March 25, 2025
As part of their livelihood, there are many leadership strategists, consultants, thinkers, and practitioners who write books.
Any of these authors also have a complicated relationship with LibGen. For those unfamiliar, LibGen—or Library Genesis—is essentially a digital warehouse of stolen intellectual property, neatly stacked with pirated books, academic papers, and various works authors and publishers never approved.
Most authors, like me, quietly grit their teeth. We sigh. We tolerate.
But now, there is something more worrisome.
In recently filed court documents, Meta, led by founder and CEO Mark Zuckerberg, is alleged to have deliberately and explicitly authorized a raid on LibGen—and Anna’s Archive, another massive digital pirate haven—to train its latest AI model, Llama 3.
Authors around the world are aghast. Without due credit—either financially or through attribution—their works have been unknowingly uploaded to a large language model (LLM).
Does this help their career progression? How does it affect their ability to gain new audiences and readers? And what does this say about Meta’s corporate culture?
Questions abound.
The Background
Court documents that surfaced recently through stellar investigative reporting by Alex Reisner at The Atlantic expose the underbelly of Meta’s self-described “innovation.”
The gist is as follows: senior Meta management recognized they urgently needed high-quality content to populate its large language model (LLM)—”books are actually more important than web data,” one email chillingly admitted.
Meta staff turned to LibGen, home to more than 7.5 million pirated books and 81 million stolen research papers, to fill that gap. They did the same with Anna’s Archive.
After internal discussions, court documents suggest that Zuckerberg himself greenlit the theft.
The Author’s Plight
Meta’s alleged actions are not merely an irritation for authors and thought leaders. It ought to be considered a moral crisis that deserves everyone’s attention, not the least of which is the U.S. Government’s National Intellectual Property Rights Coordination Center.
Authors already earn very little from books. For most, the financial reward of writing a book is modest at best. It could potentially cover a few mortgage payments, and with luck, it might even cover a little more.
Many authors will invest years into their research, writing, and revisions, followed by months (or years) of promotional engagements.
For the authors that I am friends with and talk to, it’s never about a fast payday. In fact, it’s never really about the money. It’s about the ideas, the learning, and contributing something meaningful to society.
Book writing really is part of an author’s sense of purpose.
Affordable Book Act
If proven guilty, Zuckerberg and Meta’s decision to steal dismisses all of that work as nothing more than cheap fuel for AI. One might argue that it is unfair. Another argument is that it is exploitative.
It goes further than the books being used without permission. Meta—with its $164.5 billion in 2024 revenues and almost $62.4 billion in profits—could have easily negotiated agreements with publishers and authors.
They might even have acted as the leader in LLM input data and created licensed arrangements that respected an author’s rights. Imagine if the company had the corporate culture to be a leader on one of society’s latest and most important questions: Who owns content in the LLM?
Coincidentally, Meta’s “focus on long-term impact” core value states: “We emphasize long-term thinking that encourages us to extend the timeline for the impact we have, rather than optimizing for near-term wins.”
It seems very clear that Meta was indeed optimizing for near-term wins in this case, instead of outlining a corporate culture and leadership position of collaboration and authenticity.
Fair Use Argument
As Reisner reported, when Meta’s engineers realized they needed high-quality content to make Llama 3 competitive, like published books, the team did not hesitate to take action on LibGen and Anna’s Archive. The quick fix was how to make it happen.
Why pay authors and publishers fairly when Meta engineers could exploit their intellectual property for free?
I decided to check Alex Reisner’s handy tool, which reveals if one’s books might be caught up in the LibGen heist. The result?
All five of my books were pirated and included in Meta’s dataset. The same can also be said for Anna’s Archive.
Meta, predictably, has scrambled behind the tired, old “fair use” defense in dealings with lawyers and judges. Its argument suggests that because Llama 3 allegedly “transforms” the books into new outputs.
However, fair use arguments were meant for education, commentary, and criticism, not corporate exploitation for commercial profit at scale.
Based on their 2024 financials, Meta is not some struggling teacher in Boise, Idaho, photocopying textbook pages for their students. Meta ranks among the top 10 most valuable companies in the world. Meta’s market capitalization was roughly $1.8 trillion as of this writing.
Next Steps
Some creators have filed a major class-action lawsuit alleging copyright infringement and unfair competition against Meta. This litigation might define how companies can acquire data for their LLMs in the future. <Disclosure: I am not part of this litigation.>
Regardless, AI and tech companies will continue to face scrutiny for their LLM data-sourcing practices. The industry’s voracious hunger for data often skips over the ethical considerations, which, again, brings us back to the principles of an organization’s corporate culture.
Meta’s decision spotlights the broader recklessness prevalent across the AI ecosystem. While Meta might currently be in the news for alleged data theft, other firms—some we do not know about yet—are likely guilty of similar sins. (The companies are even stealing from one another, as is alleged by OpenAI against the owners of DeepSeek.)
We urgently need transparency and robust ethical guidelines for AI LLM training.
Companies must develop sustainable, lawful partnerships with content creators, authors, publishers, and the like.
The tech companies must be put into a position to respect copyrights, intellectual property, and the simple human dignity behind creative effort.
Innovation cannot excuse exploitation.
How we treat creators today determines the future of our knowledge, art, and ideas for tomorrow.
If proven not guilty, Meta’s conduct will establish a perilous precedent, which might potentially lead individuals, authors, thought leaders, and the like to reconsider their willingness to engage in public discourse like writing books and articles. And that will be a dire shame.
Whatever the courts, and ultimately readers, decide, the battle over the definitions of intellectual property and fair usage for AI models will determine the future of the publishing industry.
Search
RECENT PRESS RELEASES
Related Post