Authors angry Zuckerberg approved use of ‘stolen’ material to train AI

March 27, 2025

Authors outraged to discover Meta used their pirated work to train its AI systems

10h ago10 hours agoThu 27 Mar 2025 at 7:17pm

A smartphone with a blue lit screen and the words 'Meta AI'; Meta AI also appears in the background — Meta launched Llama 3, an open large language model trained using pirated material, in 2024. (Getty: Jonathan Raa/NurPhoto)

Some of Australia’s best-known authors are furious to discover Meta has used their work to develop its AI platform.

In January, court documents revealed that the tech company used Library Genesis (LibGen), an online trove of pirated books and academic papers, to train its generative AI language model.

On March 20, US magazine The Atlantic published a tool that made it possible to search LibGen for the first time.

It revealed that books by authors including Charlotte Wood, Alexis Wright, Tim Winton and Helen Garner appear in the database.

At no point did Meta seek permission to use the authors’ works to train its AI systems, or offer compensation.

Author Sophie Cunningham also has several works in LibGen, including her latest novel, This Devastating Fever (2022).

“I’m really angry about it,” she tells ABC Arts.

“The average writer earns about $18,000 a year on their writing. It’s one thing to be underpaid. It’s another thing to find that work is being used by a company that you don’t trust.”

Cunningham is considering legal action and has asked her publishers to send cease and desist notices to Meta on her behalf.

A middle-aged woman with cropped hair, glasses and wearing a black shirt speaking into a microphone, holding her hand out — Cunningham likens Meta’s actions to a “colonial land grab of the 1800s”. (Supplied: Sophie Cunningham)

Bestselling author Hannah Kent was similarly shocked and disappointed to discover all three of her novels — Burial Rites, The Good People and Devotion — in the LibGen database.

“I felt completely gutted,” she says.

“It feels a little like my body of work has been plundered.”

Lucy Hayward, CEO of the Australian Society of Authors (ASA), says she understands why authors are dismayed at the news.

“Some of the world’s richest technology companies are taking their life’s work without permission or payment, because they can,” Hayward tells ABC Arts.

“Why is it that these technology companies pay their staff, pay their rent, pay their energy bills, but choose not to pay authors upon whose work this technology depends?”

Licensing too ‘slow and expensive’

This isn’t the first time Meta has been caught out using copyrighted material to train AI.

In 2023, The Atlantic revealed a database of pirated material known as Books3 had been used to train Meta’s AI model Llama, Bloomberg’s BloombergGPT and EleutherAI’s GPT-J.

Despite the outcry that followed, Meta continued to use unauthorised copyrighted material to develop its AI model.

Documents made public in copyright lawsuit Kadrey v Meta in the US show Meta investigated the option of signing licensing agreements with authors and publishers but deemed it “incredibly slow” and “unreasonably expensive”.

Court records also show some in the company argued that licensing would work against its claim that use of copyrighted texts was fair use.

Meta wanted a large dataset, and it wanted it straight away.

Conceived as a ‘shadow library project’ in Russia in 2008, the LibGen data set — which contained 7.5 million books and 81 million research papers as of January 2025 — offered a fast and cost-effective solution.

Is it fair use?

Meta is currently facing several copyright infringement lawsuits in the US, not just Kadrey v Meta.

The tech giant is relying on the defence of fair use, which permits the limited use of copyrighted material without the owner’s permission.

Toby Walsh, a leading AI researcher and Scientia Professor of Artificial Intelligence at the University of New South Wales, has five books and hundreds of academic papers in LibGen.

He disputes the argument that it represents fair use.

“Even if it were fair use, they should have bought the copy that they trained on, which they didn’t,” he tells ABC Arts.

“[Meta’s] argument is that they couldn’t build these systems if they asked. Well, they didn’t even bother trying to ask … [and] it’s not like these companies don’t have a lot of resources.”

A white man with short grey hair and yellow glasses — Walsh says AI tools have the capacity to read more books than a human could accomplish in a lifetime of reading. (Supplied: Black Inc)

Walsh describes the actions of Meta and fellow tech companies, such as OpenAI and Google, as the “greatest heist in human history”.

“As far as we know, there was an explicit instruction from Mark Zuckerberg to ignore copyright,” he says.

“They were not going to be able to build what they wanted to build unless they stole it.”

Walsh believes it is an example of Facebook’s founding motto: “Move fast and break things.”

“They were trying to move fast and catch up with OpenAI, Google and the other companies in their space,” he says.

“They’ve generated vast wealth out of this. [Meta] shares have gone up by billions as a consequence of them being a competitive player in the AI race.”

Much of that value is drawn from the content on which the AI is trained, Walsh says. “And none of that value has been filtered back to the people who generated it.”

Walsh says it is possible to train AI using works that are out of copyright, but Meta wanted more.

“They wanted to be able to talk knowledgeably about astronomy to zoology and, in my case, artificial intelligence, so they trained it on [copyrighted] works because it’s not only that they learn the language, but they also learn the content.

“To build a general-purpose tool that can answer all your questions, they need that content — and they’ve stolen it,” he argues.

A white man with sandy-coloured curly hair wearing a black t-shirt holds a smartphone in his palm — Court documents reveal that Meta’s AI team was “approved to use LibGen” after “escalation to MZ”, referring to CEO Mark Zuckerberg. (Getty: David Paul Morris/Bloomberg)

Dilan Thampapillai, Dean of Law at the University of Wollongong, believes it is possible Meta will face penalties for infringing copyright for its use of the LibGen database to train Llama.

However, “it would be much harder to prove an infringement case off the AI outputs,” he says.

For Meta, the profits created by building a generative AI model would outweigh any fine imposed by the courts, Thampapillai says.

“They’re probably seeing it as the price of doing business.”

The concerns of authors

Cunningham is clear that she’s not anti-AI.

“It’s a lack of consent that’s made me really angry here,” she says.

Kent — whose memoir, Always Home, Always Homesick, comes out in April — feels the same way.

“As a writer, what I would like, at the very least, is for someone to seek permission,” she says.

“This is what happens when someone wants to reproduce my work in any other way … even if it is for fair use.”

A middle-aged white woman with blonde hair wearing black, sitting with her arms crossed in front of her — Kent believes tech companies, like Meta, who use pirated material to train AI should be compelled to make retrospective payments to authors. (Supplied: Jonathan van der Knaap)

While Kent would prefer her work wasn’t used to train AI, she acknowledges other authors feel differently.

“I think we should be given a choice, and if we agree, we should be offered payment.”

Kent is concerned about the precedent it sets for the technology sector.

“The fact that we have Meta not only training AI and trying to claim that under fair use but also using a shadow library of pirated material … indicates the lack of ethical consideration that is being put into training AI,” she says.

“It opens the door to others also feeling like this is an acceptable way to treat intellectual copyright and creatives who already … are expected to [contribute] so much for free or without due recompense.”

Both Cunningham and Kent want to see better regulation to prevent the unauthorised use of their work to train AI.

“We need the weight of government action on this,” Kent says.

What the future holds

More regulation might be on the horizon.

The outcomes of the various AI copyright infringement cases currently underway in the US will shape how AI is trained in the future.

Governments are acting too. The ASA is a member of the Attorney-General’s Department’s Copyright and AI Reference Group (CAIRG), established to prepare for copyright challenges emerging from AI.

Walsh believes the advent of AI warrants a revision of intellectual property law.

“Copyright was formulated for printing, where you were making exact copies of people’s work,” he says.

“Here, Llama is not making necessarily an exact copy of my work — although it will tell you exactly what’s in chapter five, it will be able to write in my style, it will be able to answer questions or reproduce parts of the text — but it is derived from the intellectual labours that I and the other authors put into writing their texts.”

Walsh believes we’re at “the Napster moment”, referring to the peer-to-peer (P2P) file-sharing application that launched in 1999, revolutionising the way we listen to music. Facing a flurry of copyright lawsuits, Napster ceased operations in 2001 and filed for bankruptcy the following year.

“When we started streaming music, to begin with, all that music was stolen. It was all pirated content. No-one was paying for it. Musicians were getting no recompense for their music being streamed,” Walsh says.

“Napster was sued out of existence and, ultimately, we moved to where we are today, where we have services like Spotify and Apple Music, where they pay [for music].”

Walsh is quick to acknowledge that few musicians — bar the likes of Taylor Swift — earn a living wage from the current streaming model.

But, he says, “It’s more sustainable than it was, where there was nothing going back to the musicians at all.”

Search

RECENT PRESS RELEASES

Authors outraged to discover Meta used their pirated work to train its AI systems

Licensing too ‘slow and expensive’

Is it fair use?

The concerns of authors

What the future holds

Sungrow ESS Experience Day Munich: Accelerating to a Sustainable Future for Europe

Bitcoin, Crypto Prices Slide as Trade Tensions, Inflation Risks Rattle Markets

Bitcoin falls to $81.5K as US stock futures sell-off in advance of Trump’s ‘Liberation Day’ tariffs

New ordinance to require cannabis, hemp businesses to register with the city

Next-gen interoperability arrives for Cosmos and Ethereum

Vitalik Buterin’s Viral ‘Meow’ at Robot Sparks Ethereum Price Speculation as ETH Drops Below $2,000