Like it did for 3d models for ARKit, Apple is developing a a new audio format that makes it easier to place sounds around a user, even when that user is moving, when wearing "Apple Glass," or other AR devices, possibly based on the company's work with Pixar.
Apple has previously been shown to be working on making high-resolution video images for Apple AR, but now it is also aiming to produce high-quality audio to go alongside it. In a pair of new patent applications, the company is investigating options to do with spatial audio.
Specifically, Apple wants to establish an audio format like .MP3 or .AAC which becomes a standard, and which incorporates the extra spatial information that AR requires. The format needs to contain the actual audio, but also far more than whether an element should be played out on the left or right side of the stereo picture. Apple has previously worked with Pixar on a format called USDZ, which was to do with placing audio in a 3D space around a user.
"File format for spatial audio," is a new patent application, which may be documenting at least part of that Pixar work. However, Apple has previously said that its aim with USDZ is help with sharing across apps such as Messages, News, and Mail.
"Producing three-dimensional (3D) sound effects in augmented reality (AR), virtual reality (VR), and mixed reality (MR) applications... is challenging because existing audio formats were originally designed for producing 3D sound in a physical environment with fixed speaker locations and stationary listeners," says Apple in the new application, "such as in a movie theater."
Apple refers to AR, VR and MR with the overall term Simulated Reality (SR), and says that it wants to build on the many existing formats for 3D audio. "[For example] spatial audio formats designed to produce 3D sound include MPEG-H (Moving Picture Experts Group) 3D Audio standards, HOA (Higher-order Ambisonics) spatial audio techniques, and DOLBY ATMOS surround sound technology," it continues.
The issues are both to do with where the audience perceives the sound to be, and where the creators can elect to place effects or music. "One alternative for producing 3D sound effects in SR environments is to manipulate individual discrete sounds contained in audio objects that can be virtually located anywhere in the 3D environment," says Apple.
"[However, composing] audio for SR applications using existing spatial audio formats and objects is difficult since there is no uniform way to access a variety of sound sources and incorporate them into a dynamic SR environment," it continues.
Apple's proposed solution is to create a format that is similar to the way .m4v and .mp4 are "container" formats which group together different elements. In the case of Apple's new spatial audio requirements, the company suggests creating an "audio asset library... [which] includes asset metadata that enables simulated reality (SR) application developers to compose sounds for use in SR applications."
"The audio assets are formatted," says Apple, "to include audio data encoding a sound capable of being composed into a SR application along with asset metadata describing not only how the sound was encoded, but also how a listener in SR environment experiences the sound.
This patent application is credited to four inventors, two of whom have related previous patents. Stephen E. Pinto is named on a patent regarding spatial audio navigation for "Apple Glass," for instance, while Christopher T. Eubank has worked on the plans to make high-resolution images in Apple AR devices.
Both inventors are also among those credited on another newly-revealed and ">relevant patent to do with "Spatial Audio Upmixing."
In conventional audio, the music that might play quietly underneath a presenter speaking is typically referred to as a bed. Apple's proposal takes this term and uses it for a much more complex spatial audio system.
"A spatial bed is a multi-channel audio content that represents a complete sound field description, e.g., a virtual sphere of sound, for example surrounding a simulated reality listener in a simulated reality environment," it says. "A new spatial bed is generated by combining sections of at least two of such spatial beds."
Where current audio professionals will be conscious of left and right positioning of instruments or elements, Apple builds on this idea of thinking about a sphere instead.
"The new spatial audio object may comprise a spherical array of virtual sound sources (virtual sphere) that define a sound field surrounding a listening position of the new spatial audio object, e.g., at a center of the custom mix sphere," says Apple.
Where a typical sound editor app now has a flat graphical display of the audio as a waveform, Apple proposes a new system that shows the audio in a globe.
"The process may also visualize the new spatial audio object (new spatial bed) as a separate, new globe, e.g., in an SR environment, displaying the surface of the new globe from the point of view of the sound designer who may be inside the new globe, e.g., at the center, or outside the new globe," says the application.
"This may be presented in the SR environment as a virtual hand of the sound designer reaching out and painting with a handheld brush or spray device the inside of (or the outside of) a wall of the new globe," it continues, "where the selected sound (of the input spatial audio object) is to be rendered."
Apple does not refer to some future 3D version of Logic Pro by name in this application. Nor does it propose a specific name for the file format in the first one.
However, this isn't the first time that Apple has championed an audio format because of its perceived technological advantages. It's created its own lossless ALAC format, and also chose AAC over MP3 for the iTunes Store.
5 Comments
About ten years ago there was an experiment at my local University which had a chair located at the centre of a spherical arrangement of speakers and some nifty software to control the timing of the speaker outputs so that the sound arrived at the listener's ears at the same time. It's an awesome idea, and I'm so pleased that technology has progressed so far that we can talk about this stuff as being nearly ready for widespread access.
Why do the authors here continue to insist on using the obviously wrong name of “Apple Glass”?
The PS5 will be shipping with headphones that have some manner of 3D sound. Of course, that is most probably for games and, as with everything Sony, it probably is probably some proprietary format.
This sounds like an encoding format that would have widespread appeal outside of Apple’s primary areas of interest, for example, building acoustic simulations for moving targets, gunfire sources, approaching storms, etc. This would be valuable for evaluating acoustic detection systems, operator training, machine learning, etc. I hope Apple is successful in its goal of driving this encoding format towards becoming a standard and helping to proliferate its use without letting patent ownership issues get in the way.
Why not use ambisonics?