Apple researchers have hit on a new multi-modal method of quickly training large language models (LLMs) that can enable more flexible and powerful machine-learning and "AI" type systems.
A research paper posted by the company to research site arxiv.org earlier this week revealed that Apple has used what it calls a "careful mix" of image-caption, interleaved image-text, and text-only data to train LLMs. The mix of visual and language data allowed the models to handle tasks like intelligently captioning images or infer natural-language meanings.
As part of the research, it was determined that the choice of image encoder and the resolution of images it processes has a big impact on performance, more than the design of the vision-language connector.
In one instance, using a 30-billion-parameter MM1 model, it was found that there were strong in-context learning abilities. The discovery means it can perform multi-step reasoning over multiple images with few "chain of thought" prompts.
According to Venturebeat, Apple is continuing its tradition of being a "fast follower" rather than a "first mover" when it comes to groundbreaking technologies. CEO Tim Cook recently acknowledged that the company was spending $1 billion per year on incorporating "AI" into its existing technologies.
Cook said the company would be sharing "details of our ongoing work in AI later this year." Apple is expected to make some announcements about its advances at WWDC this June.
The company is both catching up to rivals in the use of AI-related technologies. It is also developing methods that would preserve user privacy while augmenting its existing machine-learning abilities.
The latter concern for privacy and security has not been a feature of existing "chatbot" type services, and increases the challenge for Apple.
Apple's interest in multi-model training of neural networks has resulted in state-of-the-art performance, allowing for multi-step reasoning. This suggests that the company has found a path for rapid advancement of machine-learning abilities as well as giving them advanced "intelligence" capabilities.
5 Comments
According to Bloomberg, Apple has been in negotiations with Google (Gemini) and OpenAI (ChatGPT).
If this should be true, then do I understand correctly that Apple would prefer Gemini or ChatGPT over their own AI-trained model?
Does Apple plan to integrate Gemini/ChatGPT with Siri?
If these negotiations should be true, it would indicate that Apple is really far behind on AI.
Of course, it depends on who contacted who at first like Google did with their Chrome on iOS.
This is off-topic but related to AI (as in Apple Insider). Is it just me, or does anyone else find it's PITA with this blog if you have been signed out for any reason, and you are trying to post or reply to a post, you get asked to sign in; however, once done, you are thrown completely out of the blog and have to manually navigate back to where you were. Good blogs sign you in and return you to where you left off.