Apple-Nvidia collaboration speeds up AI model production

Apple's latest machine learning research could make creating models for Apple Intelligence faster, by coming up with a technique to almost triple the rate of generating tokens when using Nvidia GPUs.

One of the problems in creating large language models (LLMs) for tools and apps that offer AI-based functionality, such as Apple Intelligence, is inefficiencies in producing the LLMs in the first place. Training models for machine learning is a resource-intensive and slow process, which is often countered by buying more hardware and taking on increased energy costs.

Earlier in 2024, Apple published and open-sourced Recurrent Drafter, known as ReDrafter, a method of speculative decoding to improve performance in training. It used an RNN (Recurrent Neural Network) draft model combining beam search with dynamic tree attention for predicting and verifying draft tokens from multiple paths.

This sped up LLM token generation by up to 3.5 times per generation step versus typical auto-regressive token generation techniques.

In a post to Apple's Machine Learning Research site, it explained that alongside existing work using Apple Silicon, it didn't stop there. The new report published on Wednesday detailed how the team applied the research in creating ReDrafter to make it production-ready for use with Nvidia GPUs.

Watch the Latest from AppleInsider TV

Nvidia GPUs are often employed in servers used for LLM generation, but the high-performance hardware often comes at a hefty cost. It's not uncommon for multi-GPU servers to cost in excess of $250,000 apiece for the hardware alone, let alone any required infrastructure or other connected costs.

Apple worked with Nvidia to integrate ReDrafter into the Nvidia TensorRT-LLM inference acceleration framework. Due to ReDrafter using operators that other speculative decoding methods didn't use, Nvidia had to add the extra elements for it to work.

With its integration, ML developers using Nvidia GPUs in their work can now use ReDrafter's accelerated token generation when using TensorRT-LLM for production, not just those using Apple Silicon.

The result, after benchmarking a tens-of-billions parameter production model on Nvidia GPUs, was a 2.7-times speed increase in generated tokens per second for greedy encoding.

The upshot is that the process could be used to minimize latency to users and reduce the amount of hardware required. In short, users could expect faster results from cloud-based queries, and companies could offer more while spending less.

In Nvidia's Technical Blog on the topic, the graphics card producer said the collaboration made TensorRT-LLM "more powerful and more flexible, enabling the LLM community to innovate more sophisticated models and easily deploy them."

The report's release follows after Apple publicly confirmed it was investigating the potential use of Amazon's Trainium2 chip to train models for use in Apple Intelligence features. At the time, it expected to see a 50% improvement in efficiency with pretraining using the chips over existing hardware.

12 Comments

netrox 13 Years · 1546 comments

About 4 months ago

elijahg said:

So Apple is finally friendly with Nvidia again?

What makes you think Apple wasn't friendly with Nvidia?

2 Likes · 0 Dislikes

melgross 21 Years · 33666 comments

About 4 months ago

netrox said:

elijahg said:

So Apple is finally friendly with Nvidia again?

What makes you think Apple wasn't friendly with Nvidia?

Good one! Apple has had a feud with Nvidia for a long time. It started with CUDA. Apple came up with their own software for that which was better. ATI embraced it, but Nvidia didn’t allow it in their GPUs. That was the first problem. The second was as linked to above, when Nvidia changed their production to different solder balls. It didn’t tell the manufacturer of the sub boards the chips were then soldered to. When the incompatible solder connections began to fail for all manufacturers, and we had two iMacs that failed because of this, Nvidia refused to take responsibility. Eventually they had to put $500,000,000 into an escrow account for manufacturers to dip into, but it wasn’t adequate.

Apple hasn’t used Nvidia since. So this is interesting and somewhat surprising.

6 Likes · 0 Dislikes

elijahg 19 Years · 2867 comments

About 4 months ago

The reason Apple switched from ATI to Nvidia in the first place was because they fell out with ATI over them leaking a PowerBook with an ATI GPU in. Unfortunately that hate of Nvidia really screwed over the Mac owners from about 2012 onwards. We were stuck with crappy, hot, slow ATI GPUs when Nvidia was much much much faster.

https://www.zdnet.com/article/ati-on-apple-leak-our-fault/