Ollama is supercharged by MLX on Apple Silicon

Ollama is supercharged by MLX's unified memory use on Apple Silicon

Ollama has been boosted by MLX on Apple Silicon

Machine learning researchers using Ollama will enjoy a speed boost to LLM processing, as the open-source tool now uses MLX on Apple Silicon to fully take advantage of unified memory.

Anyone working with large language models (LLMs) wants results as quickly as possible. There are techniques to do this using multiple Macs, working in a cluster to increase the amount of processing at hand, but one method made by Apple also provides an extra bit of assistance.

This has been undertaken by the developers working on the open-source model management and execution tool Ollama. In a March 30 update, it announced that it is previewing a version of the tool for Apple Silicon that takes advantage of MLX.

MLX is an open source project from Apple that handles networking between multiple Macs for the purposes of distributing the processing across devices. While you could do this via Ethernet, MLX also works over Thunderbolt, enabling for massive amounts of bandwidth to be made available for processing communications.

Unified memory boost

The key here for Ollama is that MLX also leverages the unified memory used by Apple Silicon. This is the shared memory system used by the CPU and GPU, meaning there is one large pool of shared memory in use instead of multiple separate pools with duplicated data.

For Ollama, the change to use Apple's machine learning framework means it can take advantage of the unified memory architecture.

This increases the speed on most Apple Silicon hardware, but it is especially useful on the M5 generation of chips. Ollama is also able to leverage the Neural Accelerators inside each GPU core to get more processing done.

The time to first token (TTFT) as well as generation speed (tokens per second) are both accelerated using the GPU Neural Accelerators by a considerable level.

In testing, the prefill performance of Ollama 0.18 without MLX was 1,154 tokens, but this shot up to 1,810 for Ollama 0.19 with MLX. Similarly, for decode performance, version 0.18 managed 58 tokens, versus 112 for version 0.19 with MLX.

As part of the same update, Ollama has upgraded its cache for efficiency, with lower memory utilization and intelligent checkpointing. There's also support for Nvidia's NVFP4 format, which can maintain model accuracy while also reducing the memory bandwidth.

High-spec only

To end users who work with LLMs, this update can be a crucial improvement, even for a solo Mac, let alone a cluster of them. For users of OpenClaw, an agent running locally on the Mac, this can help speed up its processing considerably.

However, not everyone will benefit from the change immediately. It is offered in a preview, not a generally-released update, and has specific requirements too.

The preview release of Ollama 0.19 is able to accelerate the Qwen3.5-35B-A3B model, which has its sampling parameters configured for coding tasks.

Ollama warns that it should only be used on a Mac with more than 32GB of unified memory.

The support will be limited to this model for the moment, but the team is working to add support for others. This will also include a simpler way to import custom models in the future.

Comments

blastdoor · 4222 comments · 17 Years About 2 months ago

Although Apple is behind on the development of AI products, they have a lot of good AI-related technology in their stack. This means that Apple's platforms are a great place for developers to deploy AI products. It's kind of like how in the late 80s the Mac was a great platform for desktop publishing with third party vendors providing that software.

Still, though, I think Apple does need to get into the AI product space. And it seems they know that.

Share Your Thoughts on our Forums ->

News

Ollama is supercharged by MLX's unified memory use on Apple Silicon

Unified memory boost

High-spec only

Apple's 15-inch MacBook Air M5 plunges to $1,099 in price war

London's Wigmore Hall marks 125 years with new Apple Music Classical collaboration

First European Apple Developer Center set to open in Berlin

India's $38 billion antitrust case inches forward as Apple finally agrees to cooperate

How to manage notifications on macOS 26

Elon Musk's SpaceX & Tesla email accounts must be handed over in Apple lawsuit

Aqara U400 review: UWB home key will be hard to beat

Apple's on-device AI protects privacy, Microsoft's cloud-based plans offer only convenience

Follow us on Social Media

Unified memory boost

High-spec only