A pair of academic authors have launched a class action suit against Apple, because Apple Intelligence training used a repository of books known to contain pirated copy of their books.
Susana Martinez-Conde and Stephen Macknik are both professors at SUNY Health Sciences University in New York. The pair have alleged that "Champions of Illusion: The Science Behind Mind-Boggling Images and Mystifying Brain Puzzles" and "Sleights of Mind: What the Neuroscience of Magic Reveals About Our Everyday Deceptions" were raided without licensing to train Apple Intelligence.
In short, the pair of works may have been used in aggregate to train Apple's Foundation Intelligence Models and OpenELM language models without proper licensing. Specifically, the complainants alleged in a filing on Friday afternoon that the materials are being used to test model performance, and as filters to prevent model outputs containing copyrighted material from hitting the screens of the end user.
The suit details the "Books3" shadow library's use in Apple Intelligence training. When Apple discussed OpenELM in April 2024, Apple disclosed that it used "The Pile."
"The Pile" is, or was, a curated collection of English data, that included the Books3 shadow library.
Books3, at the time, contained the entirety of the texts indexed by the Bibliotik private BitTorrent tracker. And also at the time, a TXT file listed the titles of all 186,640 books in the data set.
The authors' publications were listed.
"Because Plaintis' copyrighted book is part of Books3, Apple copied in its entirety without authorization, and trained OpenELM, on one or more copies of the Plaintis copyrighted works and directly infringed Plaintis' copyrights along with the copyrights of the Class," the suit claims.
Books3 was removed in October 2023, due to reported copyright infringement.
Challenges ahead
The lawsuit is far from a frivolous one. Authors deserve compensation when their materials are reproduced.
There are several questions about the legality of the use of books in AI linguistic training, however. And, there is a difference between what Google does with the models it has trained, and what Apple does with AppleBot.
Google, for instance, will take content it is not licensed to use, like this very article in a few days, and use it for AI summaries for search results. It will then mash up content from disparate sources in a summary presented before actual search results from the venues that wrote the content.
And, most of the time, Google does not properly credit the sources it gleans the summary from. Or worse, it will credit, but the summary is incredibly wrong.
Google itself says that more than half of the results with AI summaries do not result in a click-through.
Apple's training is linguistic — at least so far. It does not present article summaries on a search, so there is no need for accreditation.
Courts have also set the precedent, thanks to Midjourney, that proper accreditation and compensation are too hard for AI trainers to have to do. To date, the US court system tends to agree widely with that, other than a recent settlement by Anthropic.
But in the Anthropic case, Judge William Alsup said that Anthropic made fair use of the up to seven million books to train the model. Instead, Anthropic violated the authors' copyrights by saving the books it used to train the model to a central library that may or may not be used for that purpose in the future.
There's also the matter of proving that Apple actually used the publications in question. While Apple once acknowledged that it used Books3 containing the complainants' works, it's not clear if the books in question were scraped.
Individual documents processed for language use are not listed by Apple, nor is it clear if Apple keeps track of what books are used.
A financial claim in the suit may be problematic for the filers too. To prove the value of Apple Intelligence, the suit says that the day the feature was announced was "the single most lucrative day in the history of the company" as valuation jumped $200 billion afterwards.
However, even though the claim ignores the rest of WWDC added value too, that entire gain has bled off since then, in no small part to Apple Intelligence's staggered and late roll-out. Furthermore, a quick check suggests that there have been four days with greater valuation gains in the last five years.
Apple Intelligence is not yet fully out of the gate either. It remains to be seen what Apple will fully do with the models it has made.
U.S. copyright law is clear that willful copyright infringement can cost the infringers up to $150,000 per work. It's not clear if Apple copied the books from the pair willfully or not.
The pair is seeking a jury trial, monetary damages, and makes the demand that Apple not use their copyrighted work going forward.
No trial date has been set for Susana Martinez Conde, Stephen L. Macknik vs. Apple. To date, there is no comment from Apple on the merits of the suit.







