Perspective: My books were used without permission to train AI models. What now?

May 4, 2025

In 2023, nine of my books had their copyright infringed, along with another 180K titles, when all of these books were scraped at a pirate website for inclusion in the Books3 dataset, to train the LLMs of the big tech companies.

In the recent Meta heist of 7 million books from the LibGen pirate website, all of my books, including foreign editions, were also included on that site. So the copyright of my life’s work has been infringed, within a few years, by tech companies training their large language models for AI.

Alex Reisner’s investigation, published in The Atlantic magazine, makes it clear that employees of the tech companies knew that what they were doing was illegal. Deception preceded theft and has followed theft.

However distressing and demoralizing this has been, I also always try to look for solutions. I’ve spoken to a licensing company (created by humans) and set up an account in order to state that no book authored by me is available for any exploitation by AI. I must now input 216 titles (numerous editions of each book), one by one, and set up exclusions on two types of AI usage. This embargo will only be applicable if legit AI companies, in the future, wanted to license my future books, and who approach me or a third party that represents my work.

It’s like spitting into a strong wind, though, because earlier companies, in collusion with pirate sites, just helped themselves anyway. Also, everything that I have written has already been scraped. I will, however, spit into the wind because I need to do something.

Also, everything that has been scraped and used to train AI, apparently, can’t be “unlearned” by the technology. How convenient.

If you intended to pull off the biggest theft of culture in the history of mankind, you’d hope that what you’ve done can’t be undone; and to avoid litigation until the end of universe, you’d want what you’ve stolen to be laundered into derivatives in just such a way as to make it impossible to accurately detect the usage of the stolen source material.

This follows years of me sending takedown messages to pirate book websites, with mixed results. I want to write books, and publish them, and not spend my time and capacity drafting and sending out takedown notices, and filling in endless fields at licensing websites.

The British government where I live recently suggests that training AI on copyrighted works should be, more or less, fair usage. According to the proposals in the bill, the only way that I can prevent my work being used to train AI, would be for me to “opt out” all of my books, in every single edition, in every single territory, rather than “opt in.”

So each and every ISBN generated for each edition of all of my books (hence the 216 cited above) will have to be separately “opted out” — yet what about short stories in multi-author collections? What if you missed an old eBook edition from 10 years ago?

It’s just not workable or practical. An opt-in arrangement, however, would be: unless stated by the author that their work can be used to train AI, tech companies and users of AI are forbidden to touch the work. Even so, everything has already been scraped and can’t be “unlearned”, so all of this opting out is only applicable to future works.

This bill also suggests that the derivatives that go out the other end can be copyright protected. It beggars belief. So, you’d be sanctioning copyright infringement of all existing human created books, but would legally protect the derivatives resulting from this theft?

Once more, everything I have written has already been scraped and probably trained the language models. The horse has already bolted.

So, instead of writing new fiction, I have raised my voice as a citizen to explain to our government just how unfair this legislation is, and what the dire consequences will be for writers, and for human created culture. The idea that culture is fodder for AI companies and only has the worth that tech companies assign to it, is just too staggering to comprehend.

10

Comments

Mercifully, the House of Lords in Britain has asked for amendments — “forcing AI crawlers to observe UK copyright law, reveal their identities and purpose, and let creatives know if their copyrighted works have been scraped.”

To my dismay, the only factor included in these discussions appears to be money, revenue, transfer of wealth, etc. But I believe something far more valuable is at stake here: our ability as a species to think abstractly and to make sense of ourselves, the world, our place in time and to preserve the most important facets of storytelling that carry the wisdom of the ages, through every successive generation of our species.

And let’s not discard the neurological and psychological and social impact of making people stupid and making truth irrelevant through tech.

I get a sense that everything is at stake.