Apple's latest AI research beats GPT-4 in contextual data parsing

Apple AI research reveals a model that will make giving commands to Siri faster and more efficient by converting any given context into text, which is easier to parse by a Large Language Model.

Artificial Intelligence research at Apple keeps being published as the company approaches a public launch of its AI initiatives in June during WWDC. There has been a variety of research published so far, including an image animation tool.

The latest paper was first shared by VentureBeat. The paper details something called ReALM — Reference Resolution As Language Modeling.

Having a computer program perform a task based on vague language inputs, like how a user might say "this" or "that," is called reference resolution. It's a complex issue to solve since computers can't interpret images the way humans can, but Apple may have found a streamlined resolution using LLMs.

When speaking to smart assistants like Siri, users might reference any number of contextual information to interact with, such as background tasks, on-display data, and other non-conversational entities. Traditional parsing methods rely on incredibly large models and reference materials like images, but Apple has streamlined the approach by converting everything to text.

Watch the Latest from AppleInsider TV

Apple found that its smallest ReALM models performed similarly to GPT-4 with much fewer parameters, thus better suited for on-device use. Increasing the parameters used in ReALM made it substantially outperform GPT-4.

One reason for this performance boost is GPT-4's reliance on image parsing to understand on-screen information. Much of the image training data is built on natural imagery, not artificial code-based web pages filled with text, so direct OCR is less efficient.

Two images listing information as seen by screen parsers, like addresses and phone numbers

Representations of screen capture data as text. Source: Apple research

Converting an image into text allows ReALM to skip needing these advanced image recognition parameters, thus making it smaller and more efficient. Apple also avoids issues with hallucination by including the ability to constrain decoding or use simple post-processing.

For example, if you're scrolling a website and decide you'd like to call the business, simply saying "call the business" requires Siri to parse what you mean given the context. It would be able to "see" that there's a phone number on the page that is labeled as the business number and call it without further user prompt.

Apple is working to release a comprehensive AI strategy during WWDC 2024. Some rumors suggest the company will rely on smaller on-device models that preserve privacy and security, while licensing other company's LLMs for the more controversial off-device processing filled with ethical conundrums.

7 Comments

coolfactor 21 Years · 2360 comments

About 11 months ago

Makes sense since text is already being extracted from images displayed on-screen.

3 Likes · 0 Dislikes

mattinoz 10 Years · 2570 comments

About 11 months ago

coolfactor said:

Makes sense since text is already being extracted from images displayed on-screen.

Also, Images are increasingly tagged with ALT text as search engines favour sites that put work into being accessible. Apple has already put a lot of effort into Accessibility features to allow screen readers to do the same with Apps on their devices, giving them a wealth of text data at any time on the device to use for this. I guess this means they can "normalise" or even anonymise a query using on-device smarts and then feed it off to other AI systems on-line.

3 Likes · 0 Dislikes

Massiveattack87 1 Year · 106 comments

About 11 months ago

Meanwhile, GPT5 is COOKing Apple.

Honestly.. When does Apple launch it then? After several years?
After several years, their model will be outdated.

I don't believe it until it gets on the device. Otherwise, it does not reveal, but does claim that it is better than GPT-4.

danox 12 Years · 3621 comments

About 11 months ago

Massiveattack87 said:

Meanwhile, GPT5 is COOKing Apple.

Honestly.. When does Apple launch it then? After several years?
After several years, their model will be outdated.

I don't believe it until it gets on the device. Otherwise, it does not reveal, but does claim that it is better than GPT-4.

By the end of 2024 if Microsoft, Google don't produce something actually useful to the public in AI the financial world and the public will move to some other distraction my bet is: there will be nothing forthcoming.

gatorguy 14 Years · 24693 comments

About 11 months ago

danox said:

Massiveattack87 said:

Meanwhile, GPT5 is COOKing Apple.

Honestly.. When does Apple launch it then? After several years?
After several years, their model will be outdated.

I don't believe it until it gets on the device. Otherwise, it does not reveal, but does claim that it is better than GPT-4.

By the end of 2024 if Microsoft, Google don't produce something actually useful to the public in AI the financial world and the public will move to some other distraction my bet is: there will be nothing forthcoming.

So Apple is wasting their time and money chasing Generative AI; it's just a flash in the pan?