Apple's approach to generative AI seeks to avoid copyright issues

As copyright concerns plague the field of generative AI, Apple seeks to preserve privacy and legality through innovative training methods for language learning methods, all while avoiding controversy.

In recent years, the question of generative AI in relation to copyright law has remained a relatively important and complex issue. As language learning models (LLMs) and generative AI apps increase in popularity, copyright issues have continued to pile up without any kind of meaningful resolution.

Problems arise when companies use copyrighted works in training their generative AI software, and when the outputs of said AI software contain sections of works under copyright protection.

Copying copyrighted works in their entirety or using significant sections of such works for training generative AI software is copyright infringement. There is no "fair use" carve-out for AI training, despite what the companies that are training the models say or believe.

Generative AI and copyright infringement lawsuits

In late December of 2023, OpenAI and Microsoft were sued by The New York Times for copyright infringement. In the lawsuit, it was claimed that the two companies trained their generative AI software using millions of articles published by The New York Times.

This was not the first time OpenAi faced a lawsuit about model training. In September 2023, the company was also sued by several prominent authors, with George R. R. Martin, Michael Connelly and Jonathan Franzen being among them.

The history of generative AI and copyright issues goes back even further, as in July of 2023 over 15000 authors signed an open letter addressed to several prominent companies, including Alphabet, OpenAI, Meta, Microsoft and more.

The letter requested that the authors be properly credited and compensated for their work, which was used in the training of generative AI and language learning models.

Another, similar class-action lawsuit alleging copyright infringement was filed against OpenAI by non-fiction authors Nicholas Basbanes and Nicholas Gage. The lawsuit was filed in January of 2024.

In late April of 2024, another AI-related lawsuit was filed, this time against Amazon. The lawsuit alleges that an Amazon employee was instructed to deliberately ignore and violate copyright law so that Amazon could compete against rival products and services more effectively.

In the lawsuit, a former Amazon employee claims she was told by a supervisor regarding copyright-violating AI training that "everyone else is doing it" — implying that people from rival companies were knowingly engaging in copyright infringement.

And, it's pretty clear that they are.

AI and publishers' concerns about reproduction of copyrighted content

AI has been known to reproduce copyrighted content on multiple occasions, and the severity of the problem has inspired companies to analyze the frequency at which this happens.

To gain a better understanding of the rate at which AI chatbots generate copyright-protected content, the company PatronusAI decided to look into the matter. The company, which evaluates generative AI models, compared four major AI models - OpenAi's ChatGPT-4, Meta's Llama 2, Mistral's Mixtral and Anthropic's Claude 2.1.

Patronus AI found the rate at which AI generated copyrighted content ultimately varied based on the model, but that rates of copyrighted content generation were high. The company also released its own tool, known as CopyrightCatcher, which would detect potential copyright violations in LLMs.

While the generation of copyrighted content has serious implications, publishers are also concerned over the use of copyrighted material in training language learning models.

An Adobe Firefly-generated image of a wizard mouse. Definitely not Mickey from Disney's 'Fantasia'

In March of 2024, The Wall Street Journal reported that prominent publishers were investigating the use of their copyrighted works in the training of generative AI models. The publishers wanted to be paid for the use of their work by AI.

Given the number of lawsuits related to generative AI and copyright and the seriousness of the concerns expressed by publishers, it makes sense that a company like Apple would try its best to avoid any potential legal issues.

Apple's unique approach to generative AI, language learning models and copyright issues

As a way of avoiding similar copyright issues during the training of its own generative AI software, Apple has reportedly been licensing the works of major news publications.

In December of 2023, it was reported that Apple planned to try and license works from Conde Nast - the publisher of Vogue and The New Yorker. The company had also spoken to IAC and NBC News in an attempt to make a deal worth approximately $50 million.

While Apple developed its language learning model, known internally as Ajax, with basic on-device functionality, the company took a different approach to more advanced features. Apple considered licensing software such as Google Gemini for more complex tasks requiring an internet connection.

By employing this strategy, Apple clearly intended to avoid copyright issues. With the paid licensing, Apple would not be responsible for copyright infringement caused or perpetrated by software such as Google Gemini.

In a research paper published in March of 2024, Apple revealed that it used a carefully curated mixture of images, image-text and text-based input to train its in-house LLM. The method Apple used allowed for better image captioning, multi-step reasoning and preserving privacy, all at the same time.

An example of an image from an Apple generative AI graphic tool.

We were told by industry sources that Apple's Ajax LLM preserves privacy because it does not require an internet connection for basic text analysis. This means that the on-device LLM cannot connect to a database and identify copyrighted content in offline mode, although more advanced features like text-generation would likely feature such checks and connections.

Reporting and documented projects aside, guard rails and licensing are only as secure if they are enforced. Sources familiar with Apple's AI test environments speaking to AppleInsider have revealed that there were seemingly little to no restrictions to prevent someone from using copyrighted material in the input for on-device test environments.

Our source wasn't clear about regulations inside Apple to prevent copyright-violating training. The output, however, is likely more regulated to avoid word-for-word reproduction of copyrighted material.

Apple should debut its generative AI technology during WWDC which starts on June 10.

Comments

Draco · 52 comments · 5 Years About 2 years ago

This is what I like about Apple: They have a track record of not releasing products until they are fully baked and actually work the way you expect, while other companies treat their customers like beta testers.

I'll believe AI is real when Apple releases an AI.

beautyspin · 23 comments · 13 Years About 2 years ago

Looks like excuses have already started.

Xed · 3624 comments · 6 Years About 2 years ago

beautyspin said:

Looks like excuses have already started.

Ethics are an excuse for what exactly? Being responsible?

22july2013 · 4064 comments · 13 Years About 2 years ago

Apple could brand their AI as "Apple Intelligence."

AppleZulu · 2974 comments · 10 Years About 2 years ago

This just reinforces my thought that Apple will be rolling out an AI implementation that will allow Siri to provide you with a morning news summary sourced from your Apple News+ app, avoiding copyright issues entirely. It could also include information from other sources to which you have subscribed. It will verbally give you the news summary, naming sources, and then offer to drop links to any items of particular interest that you would like to read in full later.

Such a summary could be an interactive conversation. You would be able to ask Siri what’s the news about a given subject, Siri would search your Apple News+ app for new information on that subject, summarize it for you, and then offer to provide the sources for you to read later.

This would be yet another example of Apple entering a product category “late,” but only because they have taken the time to create something of quality, that avoids things like theft of intellectual property, and that is actually useful.

Share Your Thoughts on our Forums ->

News

Apple's generative AI may be the only one that was trained legally & ethically

Generative AI and copyright infringement lawsuits

AI and publishers' concerns about reproduction of copyrighted content

Apple's unique approach to generative AI, language learning models and copyright issues

How to watch WWDC 2026 live on Apple TV, YouTube, Safari & web browsers

Latest foldable iPhone leak improbably says Apple hasn't decided on colors yet

Brydge Max 13 review: Finally, a compelling Magic Keyboard alternative

Stay safe & browse the internet freely with 70% off Proton VPN

Podcast

Apple Vision Pro, WWDC, and Apple takes on Chrome, on the AppleInsider Podcast

I only want one thing from WWDC 2026, and it's got nothing to do with AI

iOS 27, macOS 27, Siri: What to expect to launch at WWDC 2026

UK wants to jail John Ternus if children's iPhones don't block nude images

Follow us on Social Media

Generative AI and copyright infringement lawsuits

AI and publishers' concerns about reproduction of copyrighted content

Apple's unique approach to generative AI, language learning models and copyright issues