Geekbench launches new AI benchmarking tool

Following months of beta testing, the newly-renamed Geekbench AI 1.0 is now available with the aim of giving users the ability to make comparable measurements of Artificial Intelligence performance across iOS, macOS, and more.

Originally released in beta form under the name Geekbench ML — for Machine Learning — in December 2023, the tool allowed for comparisons between the Mac and the iPhone. Now as Geekbench AI, the makers claim to have radically increased the app's ability to usefully measure and gauge performance.

"With the 1.0 release, we think Geekbench Al has reached a level of dependability and reliability that allows developers to confidently integrate it into their workflows," wrote the company in a blog post, "[and] many big names like Samsung and Nvidia are already using it."

Screenshot of Geekbench AI performance scores for Mac mini (late 2020), showing 1180, 2280, and 1794 for different precision tests, with benchmark details on the left.

Running Geekbench AI on a Mac mini

The regular Geekbench has long provided separate scores for single-core and multi-core performance of devices. In the case of Geekbench AI, it measures against three different types of workload.

"Geekbench AI presents its summary for a range of workload tests accomplished with single-precision data, half-precision data, and quantized data," continues the company, "covering a variety used by developers in terms of both precision and purpose in AI systems."

The firm stresses that the intention is to produce comparable measurements that reflect real-world use of AI — as wide-ranging as that is.

Watch the Latest from AppleInsider TV

Geekbench AI 1.0 is available direct from the developer. As with previous Geekbench releases, the tool is free for most users, and it runs on macOS, iOS, Android and Windows.

There is a paid version called Geekbench AI Pro. It allows developers to keep their scores private instead of automatically uploading to a public site.

5 Comments

blastdoor 16 Years · 3727 comments

About 7 months ago

Scrolling through the entries so far it seems that the Neural Engine works great for the quantized version of the benchmark and so-so for half precision. But it seems Apple hardware in general is kind of lame for the single precision version (with the best hardware being the GPU in the M3 Max).

Happy to be corrected, but I take this to mean that Apple hardware is good for on-device inference but crappy for model training.

I wonder how much of the issue is CoreML needing more optimization versus Apple needing beefier GPUs...

1 Like · 0 Dislikes

mpantone 19 Years · 2341 comments

About 7 months ago

blastdoor said:

Scrolling through the entries so far it seems that the Neural Engine works great for the quantized version of the benchmark and so-so for half precision. But it seems Apple hardware in general is kind of lame for the single precision version (with the best hardware being the GPU in the M3 Max).

Happy to be corrected, but I take this to mean that Apple hardware is good for on-device inference but crappy for model training.

I wonder how much of the issue is CoreML needing more optimization versus Apple needing beefier GPUs...

I got similar results (GPU being best for single precision, Neural Engine best for quantized benchmark) on my Mac Mini M2 Pro.

The fact that current Apple hardware (which is consumer targeted) isn't optimized for model training comes as no surprise.

This just leaves many questions and perhaps some of them will be answered in the future. My hunch is that Apple will eventually provide developers some sort of access to cloud servers that have more hardware resources for model training. There's also a small possibility that Apple could release some sort of add-in-board AI accelerator for the Mac Pro that is better suited for model training.

But for sure, Apple won't prioritize model training on their regular MacBook and mobile offerings.

I ran the benchmark on one of my Windows PCs equipped with a GeForce RTX 3060 12GB graphics card. That GPU had nearly double the performance of my M2 Pro for single precision and +50% better performance at half precision (compared to the Mac's Neural Engine). Now the 3060 is a low end card from the Ampere generation of GPUs so I'd expect even better performance from higher-end models or those from the Ada Lovelace generation.

All that said, benchmarks are a generally poor barometer of real world performance. Since iOS 18/Sequoia isn't even officially released, it will be some time until we see whether or not the Geekbench AI benchmark reflects actual performance in any sort of meaningful way. In the same way, there aren't many consumer AI applications running locally on typical PCs yet so running the Geekbench benchmark on Windows PCs.

So much of the usefulness is unleashed from the software (Apple considers itself a software company first) so just benchmarking hardware using synthetic tests might be extremely marginal in terms of usefulness or accuracy vis-a-vis real world usage.

2 Likes · 0 Dislikes

Marvin 19 Years · 15398 comments

About 7 months ago

blastdoor said:

Scrolling through the entries so far it seems that the Neural Engine works great for the quantized version of the benchmark and so-so for half precision. But it seems Apple hardware in general is kind of lame for the single precision version (with the best hardware being the GPU in the M3 Max).

Happy to be corrected, but I take this to mean that Apple hardware is good for on-device inference but crappy for model training.

I wonder how much of the issue is CoreML needing more optimization versus Apple needing beefier GPUs...

The performance looks pretty good, M3 Max is just under half of a 4090:

https://browser.geekbench.com/ai/v1/6589
https://browser.geekbench.com/ai/v1/8699

and the M3 Max can support 128GB of memory, Ultra (M4) will likely support 256GB. M3 Max is roughly equivalent to a 4070 laptop GPU.

M2 Max to Ultra doesn't seem to scale well for the test:

https://browser.geekbench.com/ai/v1/10284
https://browser.geekbench.com/ai/v1/3502

but M4 Ultra might just use a bigger GPU that will scale better.

1 Like · 0 Dislikes

blastdoor 16 Years · 3727 comments

About 7 months ago

Marvin said:

blastdoor said:

Scrolling through the entries so far it seems that the Neural Engine works great for the quantized version of the benchmark and so-so for half precision. But it seems Apple hardware in general is kind of lame for the single precision version (with the best hardware being the GPU in the M3 Max).

Happy to be corrected, but I take this to mean that Apple hardware is good for on-device inference but crappy for model training.

I wonder how much of the issue is CoreML needing more optimization versus Apple needing beefier GPUs...

The performance looks pretty good, M3 Max is just under half of a 4090:

https://browser.geekbench.com/ai/v1/6589
https://browser.geekbench.com/ai/v1/8699

and the M3 Max can support 128GB of memory, Ultra (M4) will likely support 256GB. M3 Max is roughly equivalent to a 4070 laptop GPU.

M2 Max to Ultra doesn't seem to scale well for the test:

https://browser.geekbench.com/ai/v1/10284
https://browser.geekbench.com/ai/v1/3502

but M4 Ultra might just use a bigger GPU that will scale better.

I guess you're right that being just under half of the 4090 isn't too bad for the M3 Max, especially when you think that's a comparison of a laptop SOC to a desktop GPU.