Key fair use ruling clarifies when books can be used for AI training

In landmark ruling, judge likens AI training to schoolchildren learning to write.

Artificial intelligence companies don’t need permission from authors to train their large language models (LLMs) on legally acquired books, US District Judge William Alsup ruled Monday.

The first-of-its-kind ruling that condones AI training as fair use will likely be viewed as a big win for AI companies, but it also notably put on notice all the AI companies that expect the same reasoning will apply to training on pirated copies of books—a question that remains unsettled.

In the specific case that Alsup is weighing—which pits book authors against Anthropic—Alsup found that “the purpose and character of using copyrighted works to train LLMs to generate new text was quintessentially transformative” and “necessary” to build world-class AI models.

Importantly, this case differs from other lawsuits where authors allege that AI models risk copying and distributing their work. Because authors suing Anthropic did not allege that any of Anthropic’s outputs reproduced their works or expressive style, Alsup found there was no threat that Anthropic’s text generator, Claude, might replace authors in their markets. And that lacking argument did tip the fair use analysis in favor of Anthropic.

“Like any reader aspiring to be a writer, Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them—but to turn a hard corner and create something different,” Alsup wrote.

Alsup’s ruling surely disappointed authors, who instead argued that Claude’s reliance on their texts could generate competing summaries or alternative versions of their stories. The judge claimed these complaints were akin to arguing “that training schoolchildren to write well would result in an explosion of competing works.”

“This is not the kind of competitive or creative displacement that concerns the Copyright Act,” Alsup wrote. “The Act seeks to advance original works of authorship, not to protect authors against competition.”

Alsup noted that authors would be able to raise new claims if they found evidence of infringing Claude outputs. That could change the fair use calculus, as it might in a case where a judge recently suggested that Meta’s AI products might be “obliterating” authors’ markets for works.

“Authors concede that training LLMs did not result in any exact copies nor even infringing knockoffs of their works being provided to the public,” Alsup wrote. “If that were not so, this would be a different case. Authors remain free to bring that case in the future should such facts develop.”

Anthropic must face trial over book piracy

Anthropic is “pleased” with the ruling, issuing a statement applauding the court for recognizing “that using ‘works to train LLMs was transformative—spectacularly so.'”

But Anthropic is not off the hook, granted summary judgment on AI training as fair use, but is still facing a trial over piracy that Alsup ruled did not favor a fair use finding.

In the Anthropic case, the AI company is accused of downloading 7 million pirated books to build a research library where copies would be kept “forever” regardless of whether they were ever used for AI training.

Seemingly realizing piracy may trigger legal challenges, Anthropic had later tried to replace pirated books with legally purchased copies. But the company also argued that even the initial copying of these pirated books was an “intermediary” step necessary to advance the transformative use of training AI. And perhaps at its least persuasive, Anthropic also argued that because it could have borrowed the books it stole, the theft alone shouldn’t “short-circuit” the fair use analysis.

But Alsup was not swayed by those arguments, noting that copying books from a pirate site is copyright infringement, “full stop.” He rejected “Anthropic’s assumption that the use of the copies for a central library can be excused as fair use merely because some will eventually be used to train LLMs,” and he cast doubt on whether any of the other AI lawsuits debating piracy could ever escape without paying damages.

“This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use,” Alsup wrote. “Such piracy of otherwise available copies is inherently, irredeemably infringing even if the pirated copies are immediately used for the transformative use and immediately discarded.”

But Alsup said that the Anthropic case may not even need to decide on that, since Anthropic’s retention of pirated books for its research library alone was not transformative. Alsup wrote that Anthropic’s argument to hold onto potential AI training material it pirated in case it ever decided to use it for AI training was an attempt to “fast glide over thin ice.”

Additionally Alsup pointed out that Anthropic’s early attempts to get permission to train on authors’ works withered, as internal messages revealed the company concluded that stealing books was considered the more cost-effective path to innovation “to avoid ‘legal/practice/business slog,’ as cofounder and chief executive officer Dario Amodei put it.”

“Anthropic is wrong to suppose that so long as you create an exciting end product, every ‘back-end step, invisible to the public,’ is excused,” Alsup wrote. “Here, piracy was the point: To build a central library that one could have paid for, just as Anthropic later did, but without paying for it.”

To avoid maximum damages in the event of a loss, Anthropic will likely continue arguing that replacing pirated books with purchased books should water down authors’ fight, Alsup’s order suggested.

“That Anthropic later bought a copy of a book it earlier stole off the Internet will not absolve it of liability for the theft, but it may affect the extent of statutory damages,” Alsup noted.

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

133 Comments

Anthropic must face trial over book piracy

Leave a Reply Cancel reply

Related Posts

YouTube is fighting AI slop with new monetization guidelines

The Ultimate Guide to Vibe Coding: Benefits, Tools, and Future Trends

Zhipu AI Releases GLM-4.5V: Versatile Multimodal Reasoning with Scalable Reinforcement Learning