Evoking the old Xgrid days, a new project connects Mac Studios together with Thunderbolt cables, and uses them in tandem for massively parallel computing tasks.
If you have two Mac Studios, maybe you can cluster them
A very long time ago, I was involved in cluster computing, and assisted with a few Mac-centric cluster builds in Virginia. Near the end of Xgrid availability from Apple, I also built an Xgrid cluster using beige G3 motherboards. You know, just because I could.
While the corporate- and federally-funded Xgrids were pretty good, the self-build projects were pretty janky, and fragile hacks. Apple's Xgrid worked very well in extremely specific circumstances, but very poorly outside of those scenarios.
A Mac Studio cluster really shines as the problem size . To leverage the combined GPUs, shift your thinking: choose wisely batch size & dividing of your dataset (block vs interleaving). Got 2x speedup training a large MLP on 2 MSs w/ same accuracy @awnihannun @angeloskath pic.twitter.com/P8NpPRt6TG
-- Stavros Kassinos (@KassinosS) June 15, 2024
However, a new project called MLX that uses Macs and Thunderbolt networking appears to be much smoother than that. And even better, it uses the standard MPI distributed computing methodology.
The project installation is fairly complex, but so was Xgrid's. The new project has one master machine, and as many worker Macs as can be afforded connected directly to the master machine using Thunderbolt 4 cables. This provides extremely high-speed communications between the host machine and the workers.
The worker machines can be headless, with automatic login selected, assuming Screen Sharing is also enabled. Networking is configured manually.
Computational software is installed, using Open-MPI through HomeBrew. The MLX project repository is then installed next. Full troubleshooting and configuration are beyond the scope of this article, but that's an overview of what you need to do to get started.
As with any massively parallel calculation, scaling across devices is not quite linear. One testing cluster had three nodes working a single problem at 2.9 times faster than a single Mac Studio.
My days of configuring massively parallel systems are long over. Any work that I need to do now, with some of the folks that helped me out with the high-end professional hardware reviews that requires them now is on a pre-existing grid. They all have parameters and configuration matters decided by others.
My last work with MPI was about two decades ago with version 2, if I recall correctly. The MPI Forum members are presently evaluating versions 4.2 and 5, so my time is long past.
As such, I'm leaving most of the research and execution of this to the reader. However, a cluster of Mac Studios can be easily transported, in a package as small as a duffle bag, is power- and heat-efficient, and could use an iPad as a screen.
And, it looks easy enough to configure worker machines already nearby on adjacent desks to make an ad hoc cluster. You can connect up to six Apple Silicon Mac workers to a M1 Ultra or M2 Ultra Mac Studio host machine with Thunderbolt, if you were so inclined.
Obviously, the Mac Studio is the best choice for this from a size and power perspective. It's just perhaps not the best from a budgetary perspective.
I'll be following this project. Maybe it's time to get back into things like this.