Apple releases Pico-Banana-400K image editor training data

Apple wants to improve everybody's AI image editors with new training dataset

Apple Intelligence is just fine for what it is

A new Apple research paper argues that AI imaging editors are currently trained on inadequate image sets — so Apple Intelligence researchers have released an improved one.

Despite the continual presumption that Apple is behind the industry in AI, it keeps publishing comprehensive research papers on the subject. In 2025 alone, it's most significant studies have covered how AI cannot reason, but can uncover bugs in code.

Now the researchers have published "Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing." It's explicitly concerned with how to better train AI systems to edit images following text prompts.

Despite describing current systems like GPT-4o and Nano-Banana being "remarkable [at] text-guided image editing," the paper claims that there is a key limitation with how they all work.

"[The] research community's progress remains constrained by the absence of large-scale, high-quality, and openly accessible datasets built from real images," says the full paper.

So Apple's researchers have launched "Pico-Banana-400K, a comprehensive 400K-image dataset for instruction-based image editing." As well as being a large set, what "distinguishes Pico-Banana-400K from previous synthetic datasets is our systematic approach to quality and diversity."

Diagram showing transformations: a car in snowy weather, a woman in Pixar 3D cartoon style, and a purple flower; two are approved by Gemini-2.5-Pro.

Example from the research paper showing the training process described — image credit: Apple

The approximately 400,000 images in the set have all been made freely available for non-commercial use. They are "organized by a 35-type editing taxonomy," meaning types of image edits a user could typically want.

What the researchers did

Those include edits such as moving an object in the image, adding artistic effects, and zooming on. Apple's researchers uploaded each image in the set to Nano-Banana, together with such a prompt.

Using Gemini-2.5-Pro, the researchers had the resulting images analyzed and then either rejected or accepted them.

"The result became Pico-Banana-400K, which includes images produced through single-turn edits (a single prompt), multi-turn edit sequences (multiple iterative prompts)," say the researchers, "and preference pairs comparing successful and failed results (so models can also learn what undesirable outcomes look like)."

Having now produced this large image dataset, Apple's researchers say Pico-Banana-400K "establishes a robust foundation for training" AI image editors.

Separately, Apple most recently improved its own Image Playground in June 2025. It added more ChatGPT-powered image styles.

Comments

gatorguy · 24987 comments · 15 Years About 7 months ago

This is Apple making good use of Google's Gemini-2.5-Flash-Image, a.k.a Nano-Banana, contributing another good data-training library with a clear open-source license for non-commercial uses. That means no apps, no AI programs or features from companies, or any other for-profit enterprise. Researchers, schools, etc will definitely find this Apple-provided data-set useful.

Share Your Thoughts on our Forums ->

News

Apple wants to improve everybody's AI image editors with new training dataset

What the researchers did

Apple's 15-inch MacBook Air M5 plunges to $1,099 in price war

London's Wigmore Hall marks 125 years with new Apple Music Classical collaboration

First European Apple Developer Center set to open in Berlin

India's $38 billion antitrust case inches forward as Apple finally agrees to cooperate

How to manage notifications on macOS 26

Elon Musk's SpaceX & Tesla email accounts must be handed over in Apple lawsuit

Aqara U400 review: UWB home key will be hard to beat

Apple's on-device AI protects privacy, Microsoft's cloud-based plans offer only convenience

Follow us on Social Media

What the researchers did