Affiliate Disclosure
If you buy through our links, we may get a commission. Read our ethics policy.

Apple Silicon 13-inch MacBook Pro nearly as fast at machine learning training as 16-inch MacBook Pro

The M1 processor is up to 3.6x faster at ML training vs Intel

While Apple still needs to fully optimize the M1 processor and its software for the task, a 13-inch MacBook Pro with Apple Silicon performed nearly as well at a machine learning test as the 16-inch MacBook Pro with dedicated Radeon graphics.

Benchmarks for the M1 processor have been impressive so far with scores rivaling even the most expensive Intel MacBook Pro configurations. These are early days yet as software continues to be optimized for the processor, so some tasks and processes will see big speed jumps as developers take advantage of the hardware.

One space that the M1 processor should excel in is machine learning (ML) processes. As with Apple's A-series chips like the A12Z Bionic, the M1 has a dedicated Neural Engine used for complex data processing and ML. Apple says the M1 Neural Engine can handle up to 11 trillion operations per second when in use.

This processor is not best in class in terms of machine learning however, as dedicated GPUs from companies like Nvidia boast even higher numbers for neural operations. The first generation of Macs running Apple Silicon only have the M1 processor to rely on— no additional GPU options are available.

The developers at Roboflow wanted to pit Apple's new machines against the older Intel variants. The processor transition has only just begun for Apple, so tools like TensorFlow have not yet been optimized to run for a full benchmark test.

The testers chose to use Apple's native tool called CreateML, which allowed developers to train a machine learning algorithm with object-based learning and no written code. The tool is available on the M1-based Macs, so the testers believe it should have been properly optimized to perform the test.

They chose to compare the 13-inch MacBook Pro with an M1 processor and 8GB of RAM to the 13-inch MacBook Pro with Intel Core i5 and 16GB of RAM which has a dedicated Intel Iris Plus Graphics 645 card. The 16-inch MacBook Pro with an Intel Core i9 processor, 64GB of memory, and a dedicated Radeon Pro 5500M was also tested.

The Roboflow team decided to run the test with a no code object recognition task. They used the 121,444 image Microsoft COCO object detection dataset, then exported the assets using Roboflow software to convert it to Create ML format. They ran the CreateML software using a YOLOv2 object detection model over 5,000 epochs with a 32 batch size.

The COCO dataset used is a large image database of objects that should be easily recognizable by a 4-year old, and is used to test machine learning algorithms. YOLOv2 is a type of image recognition that uses boundary boxes to show where an object is in an image. An epoch is one cycle of a test and a batch size is the number of objects run though each cycle.

Basically, the computers will be shown a series of images and have to decide what is being shown based on what it has learned from what it was shown previously. As it sees more images of a given object it will get more accurate at identifying that object in other random images.

The results:

  • The M1 based MacBook took 149 minutes to finish the test with 8% GPU utilization
  • The MacBook running the Intel Core i5 took 542 minutes to run the test, though didn't use the Intel Iris Plus Graphics 645
  • The MacBook running the Intel Core i9 with Radeon Pro took 70 minutes and utilized 100% of the GPU during the test

The team notes that CreateML was able to use 100% of the discrete Radeon GPU but didn't bother using the Intel Iris at all and only 8% of the integrated M1 GPU. This affected the time and likely is due to Apple needing to further optimize the toolset for the M1 processor.

Based on this benchmark, the Apple M1 is 3.64 times as fast as the Intel Core i5. However, the M1 machine is not fully utilizing its GPU and — so far — underperforms the i9 with discrete graphics.

Apple is expected to continue optimizing its CreateML framework and is working with TensorFlow to properly port their toolset to M1. Future M-series processors may have even more powerful neural engines and processors as rumors already indicate a 32-core M-series chip could be in a future desktop Mac.



16 Comments

tipoo 14 Years · 1122 comments

>149 minutes 
>70 minutes

2.2x slower doesn't sound "nearly as fast" as the 16. The low GPU utilization is because ML tasks are automatically dispatched to the neural engine instead. 

yeldarby 4 Years · 3 comments

tipoo said:
>149 minutes 
>70 minutes

2.2x slower doesn't sound "nearly as fast" as the 16. The low GPU utilization is because ML tasks are automatically dispatched to the neural engine instead. 

Hi, I'm one of the ones that helped with this benchmark. That's almost certainly not the case; if it was, you would expect the CPU to also be largely idle as it is with GPU-constrained training but it was pegged between 150%-250% on the M1 for the duration of training. Leads me to strongly believe there is more optimization yet to be done to adapt the CPU part of the pipeline for the new architecture and that this will get much better over time.

mjtomlin 20 Years · 2690 comments

tipoo said:
>149 minutes 
>70 minutes

2.2x slower doesn't sound "nearly as fast" as the 16. The low GPU utilization is because ML tasks are automatically dispatched to the neural engine instead. 

I believe, the ANE is optimized for running ML models, not training them. The ML accelerators would be more suited for that. And I'm pretty sure those accelerators are in the CPU as an extension of the ARMv8 ISA.

tjwolf 12 Years · 423 comments

What was the memory utilization during this?  I know nothing of ML, but I'd think reading/analyzing images is an inherently memory-intensive task.   If so, not sure what comparing an 8gb M1 Mac with a 16gb Mac w. separate GPUs reveals.

alphafox 11 Years · 132 comments

So the M1 used its neural accelerator but the i5 was running in software on the CPU only? How are these results compatible at all?