ML in macOS Tahoe gets GPU, Thunderbolt 5 clustering boost

macOS Tahoe 26.2 will give M5 Macs a giant machine learning speed boost

ML Researchers can cluster Macs together with MLX. Thunderbolt 5 will help improve performance.

Machine learning researchers using MLX will benefit from speed improvements in macOS Tahoe 26.2, including support for the M5 GPU-based neural accelerators and Thunderbolt 5 clustering.

People working in the field of machine learning have been able to use Apple's MLX framework for quite a while, with it helping to train and deploy models using Apple's hardware. As part of Apple's update to macOS Tahoe 26.2, those using newer hardware can expect to see some potential improvements in performance.

MLX is Apple's open-source machine-learning framework, designed to take advantage of Apple Silicon's features. Able to run on any Apple platform that supports Metal, it can benefit from things like CPU and GPU processing and unified memory.

The first big change that researchers will notice if they're running on an M5 Mac is a tweak to GPU processing. Under the macOS update, MLX will now support the neural accelerators Apple included in each GPU core on M5 chips.

Since the M5 launch, the new GPU architecture has been accessible to developers, who can write directly to the neural accelerators using Tensor APIs and Apple's Metal 4 and Core ML frameworks. Under the update, MLX will natively support the use of the neural accelerators.

The upshot for ML researchers is that they can enjoy a considerable boost in performance for on-device processing. This can be as much as four times the peak AI performance of running the same large language model (LLM) on an M4 chip, at least when it comes to initially responding to a prompt.

Clustering

Another change to MLX in macOS Tahoe 26.2 is the inclusion of a new driver that can benefit cluster computing. Specifically, expanding support so it works with Thunderbolt 5.

Researchers have been able to use Thunderbolt to connect multiple Macs together for processing using MLX before. The technique splits an LLM across multiple Macs connected over Thunderbolt, dividing the workload up and also sharing the available unified memory.

This could all be run over Ethernet at up to 10Gb/s, depending on the Mac's specification. Using Thunderbolt allows for connectivity between Macs at much higher speeds than a typical network, though.

Thunderbolt 4 used in clustering like this reaches a maximum speed of up to 40Gb/s. Thunderbolt 5 extends the bandwidth to 80Gb/s.

We asked Apple about the specifics of this speed boost, and haven't heard back yet.

Regardless, the change to enable Thunderbolt 5 effectively allows for faster connectivity between the Macs in a cluster. This is especially useful under Remote Direct Memory Access (RDMA), a style of network where one computer can directly access the memory of another.

In the case of a Mac cluster across Thunderbolt, the Macs effectively share their memory with each other, creating a larger memory pool than would normally be available to a single Mac. An LLM that is too large for one Mac's memory capacity is broken down to chunks that are stored on each Mac, and is then accessible by any Mac in the collection.

While this is largely considered as beneficial to aggregating memory in a cluster, Apple's change to include Thunderbolt 5 also stands to improve compute aggregation. Using the Tensor framework can increase inter-CPU communications in a latency-sensitive way, something the increased Thunderbolt 5 bandwidth can help mitigate.

The bottom line here is that it will improve performance by increasing the memory bandwidth and overall communications between each Mac in the cluster.

Continuing to help development, but expect better soon

The M5 GPU neural accelerator support will benefit all users using newer Macs more than the clustering, which will be experienced by well-funded researchers with deep pockets. Ultimately, it is all in the goal of furthering machine learning development by using commodity hardware that is reasonably easily accessible.

Apple already has to live with in the shadow of the slow Apple Intelligence rollout and the long-delayed overhaul of Siri. However, supporting AI development in this way is beneficial to Apple, which has consistently leaned on local ML processing in its products.

Users of MLC can do so by combining multiple different Apple Silicon machines together using Thunderbolt. If all support Thunderbolt 5, they will all have improved inter-device communication.

Likewise, if the researcher is processing on one M5 Mac, like the 14-inch MacBook Pro, they will see a performance benefit as well.

There is a catch, however, in that researchers won't benefit from using both improvements at the same time under the current Apple Silicon roster. So far, there's only one M5 model, and it supports the slower Thunderbolt 4, not Thunderbolt 5.

The situation should change when Apple releases Mac models using Thunderbolt 5, such as M5 Pro and M5 Max releases expected in early 2026.

This is anticipated to include an update to the Mac mini in M5 and M5 Pro forms, with the latter benefiting from both the GPU neural accelerators and Thunderbolt 5.

At around the same time, we can expect updates to the Mac Studio, which Apple has practically used to replace the Mac Pro as the high-performance Mac option. For the Mac Studio, the options should include an M5 Max and possibly an M5 Ultra version.

To long-time Apple users, MLX could be considered the modern-day revival of the Xgrid concept, which turned a collection of Macs into a supercomputer via the means of distributed computing. The MLX method is similar in that work is spread across multiple Macs, except that the communication between the nodes can be performed at a considerably higher speed than under the long-dead Xgrid could ever manage.

The technique of cluster computing under MLX also provides a way for researchers to maximize their computing budgets. A low-powered Mac could be used as a controller for the cluster, feeding the jobs to the heavy hitters for the actual processing element.

For example, you could set up a Mac Studio cluster with high amounts of memory but minimal storage to handle the serious number crunching. However, they may also use a low-powered device like a MacBook Air to run the queries on the cluster, displaying results, and managing the setup.

Apple has told us that there will be more information coming within days.

Comments

Xed · 3620 comments · 6 Years About 6 months ago

That's definitely great news for M5 Mac users. I kinda wish I wasn't waiting until (at least) the MacBook Pro adds a TandemOLED display.

swat671 · 213 comments · 11 Years About 6 months ago

I’m curious now- it sounds like it works on any Mac that supports Metal and Thunderbolt. Does that include Intel models too? A tricked out Mac Pro with a bunch of internal video cards, or MacBooks with external GPU’s? That could be quite quick as well.

blastdoor · 4215 comments · 17 Years About 6 months ago

Let’s put that 10 GB/sec in context.

InfiniBand HDR is 25 GB/sec

An x16 PCIe 5 slot is 128 GB/sec

M3 ultra unified memory bandwidth is over 800 GB/sec

NVlink is 900 GB/sec

Mike Wuerthele · 7675 comments · 10 Years About 6 months ago

blastdoor said:

Let’s put that 10 GB/sec in context.
InfiniBand HDR is 25 GB/sec
An x16 PCIe 5 slot is 128 GB/sec

M3 ultra unified memory bandwidth is over 800 GB/sec

NVlink is 900 GB/sec

The stuff we've seen from Apple with TB5 is faster than InfiniBand, but it's not clear by how much. Like the piece says, we asked. Also, job communication is bursty, and how much that bottleneck is, depends very much on the calculation and workflow.

swat671 said:

I’m curious now- it sounds like it works on any Mac that supports Metal and Thunderbolt. Does that include Intel models too? A tricked out Mac Pro with a bunch of internal video cards, or MacBooks with external GPU’s? That could be quite quick as well.

It does not.

Share Your Thoughts on our Forums ->

News

macOS Tahoe 26.2 will give M5 Macs a giant machine learning speed boost

Clustering

Continuing to help development, but expect better soon

Android Apple Music beta hints at alternate tiers, skip limits

Save up to $1,300 on every 14-inch MacBook Pro M5 Pro and M5 Max

Abxylute M4 review: Smart design can't save a cramped controller

Detroit's controversial Apple Developer Academy has entered its 5th year

The new Mac-friendly monitors in Samsung's lineup fix problems Apple displays still have

Podcast

iPhone leaks, Apple Vision Pro gaming, and the Ferrari Luce, on the AppleInsider Podcast

Suppliers are racing to keep up with orders for incredibly popular MacBook Neo

The last visionOS 26 review: Apathy about Apple Vision Pro on display

Follow us on Social Media

Clustering

Continuing to help development, but expect better soon