Inside Metal: How Apple plans to unlock the secret graphics performance of the A7 chip
Among the surprises that Apple unveiled at WWDC 2014 is the company's new Metal framework and shader language, aimed at radically enhancing the hardware accelerated graphics potential of the A7 Application Processor powering the company's latest iOS devices.
Introduced during the WWDC Keynote by Apple's software chief Craig Federighi— who is said to be a particular fan of metal rock— Metal as a technology applies to the GPU (Graphics Processing Unit) of Apple's new 64-bit A7 Application Processor used in its newest iOS devices: iPhone 5s, iPad Air and Retina iPad mini.
The new technology's name actually derives from the fact that it provides "close to the metal" graphics performance by slimming down the overhead imposed by existing graphics libraries like OpenGL. Metal speeds up 3D rendering and general compute tasks while freeing up the CPU to handle additional work, such as more sophisticated physics modeling or audio processing in video games, for example.
In an initial WWDC session devoted to Metal, Apple's GPU software engineer Jeremy Sandmel stated, "we're incredibly excited" to outline the new technology for developers, noting that "we believe it's literally going to be a game changer for you, your applications and for iOS."
Note that the engineer presenting Metal wasn't speaking eloquent marketing language to a crowd of potential customers; he was outlining new technology that developers can use to gain dramatic increases in performance, something that will benefit their apps and Apple's ecosystem. If it doesn't benefit developers' apps enough for them to use it, Metal won't benefit Apple either.
We will soon see whether Metal is a meaningless marketing term like "Intel Inside," that was just invented to sell otherwise undifferentiated hardware, or whether it is a truly new technology that people notice because it is producing a new class of mobile games that can't be matched elsewhere.
So far, Apple has already taken over the high end of mobile video games, with lots of exclusive titles that aren't available on Android, BlackBerry or Windows Phone (and in the future, won't be available on Tizen). That fact that Android doesn't get many high end games is well known enough for AnandTech to observe with rather brutal honesty that "the games that benefit the most from Metal are also the games least likely to be on Android."
Radical new hardware needs radical new software
In the desktop PC world, Macs and Windows PCs have incrementally delivered performance jumps by installing faster and faster CPUs paired with video cards outfitted with increasingly faster dedicated GPUs. Apple's new Mac Pro actually pairs two fast GPUs alongside the main CPU as standard equipment, with one GPU dedicated to video performance and other available for advanced rendering tasks and general compute acceleration. All three processors are cooled by a central heat sink (below).
However, under iOS and the constraints of mobile design— where battery life, heat dissipation and a compact thermal envelope and are critically important— there's a need for rethinking how the latest blazing-fast GPUs are driven. To deliver impressive mobile graphics, a more specialized approach is required because the pure horsepower of a desktop machine is physically too big.
Under iOS and the constraints of mobile design— where battery life, heat dissipation and a compact thermal envelope are critically important— there's a need for rethinking how the latest blazing-fast GPUs are driven
Additionally, today's mobile GPUs are now so fast that the CPU cores often have trouble feeding graphics tasks to them fast enough. Once the CPU cores are maxed out, the GPU is left sitting idle, waiting for new work to be dispatched to it by the CPU.
In this area, the general purpose design of OpenGL is reaching the limits of what it can do, largely because it hogs up so much time on the CPU with tasks like state validation and GPU shader compilation.
That's the case particularly in the field of video games, where the target frame rate needs to consistently remain high in order to deliver a fluid experience.
It's also important in the realm of general compute functions, including tasks like encryption or audio and video processing. Apple hasn't yet made OpenCL available for public developer access on its mobile devices, but Metal solves that problem too, because it works with both GPU and "GPGPU" (General-Purpose Computing on Graphics Processing Units) tasks.
With Metal, Apple has targeted the overhead baggage of OpenGL for bypassing with a highly optimized new framework to allow mobile developers to coax the best possible performance from its new A7 (and of course, future A-series chips using the same types of advanced GPU technology).
Precompiling GPU code with the Metal Shader Language
Metal works in part by identifying the tasks that can precompiled in advance so they they can execute without delay at runtime. This involves Apple's new Metal Shader Language, which is used to write "shaders," the specialized computer programs designed to be rapidly run by a GPU.
Initially, a GPU shader described how to apply the smooth shades of color needed to create realistic surfaces on a 3D model. In today's more general terms, a shader can be any sort of image or video processing, from calculating the geometries of an animated 3D model, to rendering individual pixels of a scene, or creating a motion blur effect to an existing frame of video. Shaders can also be used to package general computational tasks for rapid execution on the GPU.
Apple's new shader language for Metal defines both typical GPU graphics operations as well as general compute functions, using the same data structures and resources for both to make things easy and constant for developers. This makes it an efficient, A7-optimized alternative to both OpenGL and OpenCL.
Like Swift, the new programming language Apple created to enhance the compiling performance of code built around Cocoa frameworks, the equally new Metal Shader Language is designed to compile shader code efficiently via LLVM and, whenever possible, in advance of runtime.
For example, a video game could deliver precompiled Metal shaders within its app that are ready to run on the A7's GPU immediately without further processing. By precompiling as much of the shader code as possible, the CPU is freed from having to compile it during game play, saving precious milliseconds of processing time that can now be used for other tasks.
By precompiling as much of the shader code as possible, the CPU is freed from having to compile it during game play, saving precious milliseconds of processing time that can now be used for other tasks.
The more general-purpose OpenGL is designed to compile shader object code on the CPU into GPU machine code before it can run. This enables it to support a wide variety of different GPUs, each of which needs to have the required shaders compiled specific to its chip architecture.
The downside to running everywhere is that the code doesn't run particularly fast anywhere. Code optimized for a particular architecture inherently runs faster and can be optimized further. Even when running cross-platform is important, such as for a ubiquitous service like Facebook, native apps have proven to be far superior to web apps. In video games, where performance is paramount, optimized native code is quite obviously even more critically important. That's the role of Metal in iOS 8.
Metal readied for A7, A7 ready for Metal
In addition to compiling many shaders in advance of runtime, Metal also works to strip away much of the overhead imposed by the general purpose structure of OpenGL, particularly with respect to "state vector," or all of the details associated with each draw call prepared by the CPU and handed to the GPU. This process involves a lot of expensive "bureaucracy" under OpenGL.
Because Metal is optimized specifically to run code on Apple's A7, it doesn't have to deal with all of the various differences in competing GPU designs. That allows Metal to focus exclusively on building and optimizing code targeting the A7's unique architecture. At the same time, Apple is also adding new support for modern GPU features, enabling developers to access more cool new stuff while dealing with less legacy overhead related to supporting outdated architecture cruft.
This hyper-optimization of Metal is possible because Apple develops its own Application Processor chip designs and then standardizes the tens of millions of its latest devices to all use the same hardware. While Apple has customized Metal to wring every drop of performance out of the new A7, it also designed the A7 specifically to excel at running Metal code optimized by the LLVM compiler.
One of the most perplexing conundrums of the iPhone 5c Failure Myth, which insists that Apple was upset to find that it was selling mostly higher end iPhone 5s models rather than its middle tier model, is that the iPhone 5s not only costs more, but also installs a new A7 user. The more higher-end A7-powered phones Apple can sell, the faster and easier it can keep itself differentiated ahead of the "carrier friendly, good enough" sort of models that its competitors are pushing to achieve volume sales.
That's because A7 devices can do things that yesterday's 32-bit chips paired with value engineered ARM Mali integrated graphics can't. And a large installed base of A7-class iPhones and iPads will create a market for A7-optimized apps. Metal is one of the technologies Apple is using to advance the A7's lead even further, creating a stark contrast in sophistication for buyers to notice.
In addition to the surprise of delivering a modern, 64-bit ARMv8 CPU architecture paired with an advanced 6series Rogue GPU, Apple's A7 also incorporates an integrated memory architecture between its CPU and CPU cores. This allows Metal to coordinate the CPU's feeding of the GPU with instructions without needing to constantly pass data back and forth between a central system cache and a dedicated graphics cache.
Apple didn't announce any of that when the A7 was first introduced. Not even the experts at Chipworks could identify what all the silicon on the A7 was doing (above), and there continues to be some controversy about why the A7 would need "over one billion transistors."
The A7's billion transistors puts it in the same category as Sun's UltraSparc T3 16-core server CPU (really). A 6-core Gulftown Intel Core i7 has 1.17 billion transistors. An Intel Core 2 Duo has 291 million, while the original Macintosh was powered by a Motorola 68000 with a mere 68,000.
It turns out that Apple had all sorts of big surprises up its sleeve, and it kept a very straight face while a series of clowns began issuing their ignorant opinions about what they thought the A7 was.
A series of big surprises from Apple's A7
Recall that when Apple first introduced the iPhone 5s' advanced A7 chip, it was first greeted with media skepticism wondering if it was even "truly 64-bit," followed by a communal (and wholly incorrect) story that suggested 64-bit mobile chips didn't matter until devices had "4GB of addressable RAM," and that, in the most extremely ignorant coverage, the A7 was "marketing fluff and won't improve performance."
Meanwhile, there were additional months of unanimous media agreement that the power of the A7 "wasn't even necessary" for a mobile device, and that the real excitement of the mobile industry instead revolved around Google's efforts to make very low priced Motorola devices as well as Google's plans to scale back its previously ambitious plans for Android 5.0 to instead deliver Android 4.4 KitKat with the primary goal of running on low end products (albeit only ones sold in the last 18 months, excluding even its own Galaxy Nexus).
It wasn't just Google that was focusing Android on the low end. Throughout 2013, Apple's primary rival Samsung was shipping the majority of its "smartphones" as low end devices, eroding its "Galaxy" brand from meaning "premium iPhone-class devices" to instead referring to virtually everything it sold with Android on it, mostly low end products that are less sophisticated than Apple's now entry level iPhone 4 from 2010.
Apple was laying the foundation for a series of advances in secret while media pundits and financial analysts were collectively agreeing with each other that Apple was "no longer innovating" throughout most of 2013, just because those individuals weren't aware of what was going on.
The new architecture of the A7 already delivers a significant performance boost when running existing OpenGL code (as shown below, running GFXBench).
Now, after selling tens of millions of A7-equipped devices, Apple is revealing that its mysterious A7 chip has a radically untapped potential to deliver— via the new Metal— graphics performance that's ten times greater than even its already impressive OpenGL benchmarks originally indicated.
Additionally, the company signaled to its WWDC attendees that they have a window of opportunity to build Metal-enhanced games and other apps that in a few short months will have a broad installed base of tens of millions of iOS 8 users bearing A7 brains. Apple didn't trumpet any of this, because it wanted to make sure that Samsung, Google and everyone else using Android kept on cranking out yesterday's 32-bit phones with no attention to delivering an installed base of mobile devices with advanced graphics capabilities.
Samsung likes to tout the high (2.5GHz) clock rate and "eight cores" of its latest Galaxy S5. However, the device still uses either Adreno graphics or ARM Mali graphics (depending on where the model is sold.) Not only do both versions deliver only basic graphics performance, but the split divides the installed base for either chip architecture, making it difficult to impossible for Samsung to get developers to really coax the full performance from either one with some specialized software similar to Apple's Metal.
Unlike Apple's flagship iPhone 5s which makes up the majority of the company's smartphone sales, Samsung's Galaxy S5 represents only a fraction of the smartphones it sells. Even if Samsung decided right this instant to copy Apple, it would still be a year behind in starting to install a user base of smartphone customers with A7-class hardware.
While Samsung is widely known to be over a year behind in delivering a 64-bit CPU, it's also at least as far behind in building an installed base of advanced GPUs too. And the fractured way that Samsung (and every other Android licensee) rolls out new technology is much slower. In stark contrast, next year most of Apple's installed base will have 64-bit CPUs paired with advanced PowerVR 6series GPU graphics.
Rapidly creating a large installed base of mobile devices with advanced hardware (far superior to its existing peers) is what made the iPhone App Store wildly successful while at the same time making other platforms like Palm OS, Windows Mobile and Symbian look old and saggy. Customers noticed the cool new iOS software and apps, but it was Apple's great leap in hardware that was exclusively able to power that software.
The A7 is both literally and figuratively a game changer.
Android's ability to match (and often exceed) the hardware sophistication of Apple's iPhones has allowed it to catch up. But over the past two years, Android has been cultivating a mass market volume play, rather than keeping pace with Apple's high end mobile devices.
That's most obviously the case in the Application Processor's CPU and GPU features, and that's clearly why Apple has been investing so much into creating its own proprietary chip designs capable of staying far ahead of the status quo. The A7's sophistication goes beyond processing speed to also greatly enhance camera performance and integrate a secure implementation of Touch ID.
For "smartphones" that are used as basic feature phones, or tablets that are primarily used as personal TVs, the A7's advantages won't matter much. But for the cream of the mobile market, a premium segment that plays video games and cares about powerful mobile apps (segments that include education, government, and corporate enterprise), the A7 is both literally and figuratively a game changer.
Who's going to listen to Metal?
There's no need to worry that the gaming industry is going to abandon OpenGL for Metal. Many large games developers often do try to reach the largest audiences with their apps, something that ostensibly precludes them from adopting Apple-specific technologies, including Swift or Metal. Most of these will certainly continue to use OpenGL along with similarly platform-agnostic gaming frameworks (like Unity) that allow them to target both iOS and Android (as well as the current majority of iOS devices that lack an A7-class chip).
However, for developers who want to stand out in the App Store, Metal promises a dramatic, order-of-magnitude improvement in performance, delivering frame rates, creature swarms and new levels of gaming sophistication that would be impossible without the specialized Metal framework even on the A7, let alone the grab bag of low end, "good enough" GPUs present on the majority of Android phones.
The decision to support Metal isn't necessarily all or nothing; the A7 installed base is already large enough to be able to entice developers to create a Metal-enhanced port of their existing titles to offer an optimized version for A7-equipped devices.
For example, a developer like Epic Games— which brought its Unreal Engine to iOS to deliver a trio of "Infinity Blade" games (below) that each launched exclusively for iOS— Metal offers a new way to deliver even more impressive, iOS-exclusive titles.
Epic founder and Unreal co-creator Tim Sweeney appeared on stage during the WWDC keynote to show off the Metal-enhanced Unreal Engine 4 in a demo title named "Zen Garden," created entirely to display an outrageous number of drifting cherry blossom petals, an unreal swarm of koi fish, and throngs of thousands of interactive, flittering butterflies.
Electronic Arts, well known as a cross platform games developer, also appeared on stage to demonstrate its own Frostbite, a "console level" graphics engine that it said it did not anticipate being able to directly port to a mobile device. Thanks to Metal, it can. In a Metal demonstration of its latest installment of "Plants vs. Zombies," EA noted at one point that there were "1.3 million triangles on the screen."
Crytek demonstrated its Metal-enhanced version of "The Collectables," capable of "4,000 draw calls per frame." In the game, users navigate a team of mercenaries through a battlefield where exploding vehicles can throw over a hundred chunks of debris into the air as they explode, delivering an immersive new level of gameplay.