So needless to say, Apple has been able to do some amazing things with your photos, and now it looks like the company is prepared to tackle the same challenge for recording audio by using machine learning to create a 3D soundstage to go along with the stunning 4K videos that the iPhone can already capture.
A new patent application surfaced by Patently Apple describes a new approach called “Spatially biased sound pickup for binaural video recording” that would allow Apple to take audio recorded from multiple microphones on the iPhone and then run it through advanced AI algorithms that would reproduce the exact sound scene, making you feel like you’re actually there and immersed in the recorded experience.
To be clear, the technology itself isn’t new — binaural recording has been around for a couple of decades, but normally uses dual microphones (the “bi” in “binaural”) to mimic where human ears would normally be. In this configuration, each microphone captures the slight differences in sound, the same way that your ears would in the real world, and these are then mixed to produce a more realistic soundscape. Some binaural recording systems have even gone so far as to actually use physical models of human ears in order to reflect the way that sounds actually travel into and through the aural canals for an even greater degree of realism.
So what’s especially intriguing about Apple’s approach is not that it plans to offer binaural recording — given the proper microphone arrangement, anybody can do that — but rather that it’s looking for ways to do it through machine learning, using multiple microphones in a more limited arrangement.
In fact, according to the patent, Apple is proposing that not only will the sound be analyzed from multiple microphones, but that the video footage captured from the dual-lens camera system would also be used to help create a computational model of the surroundings. In other words, the Neural Engine could analyze lighting and depth maps to determine the size of the room and the proximity of objects that would normally reflect sounds in certain ways.
3D Sound Capture
While it’s easy to imagine this being used in Augmented Reality applications, especially in support of Apple’s upcoming AR headset, Apple actually reveals in the patent that it’s already addressed full 3D sound capture for AR purposes, using spatial rendering through a method known as Head Related Transfer Functions (HRTF). However, this method is designed to modify the signal to allow for the kind of spatial positioning commonly used in VR gaming — basically the feature that allows you to hear sounds coming from different angles.
However, while full 3D sound capture is great for VR applications, it’s actually distracting when simply watching traditional video recordings as it becomes unrealistically immersive. Binaural recordings, which mimic the way that your ears normally hear sounds, seems to be the sweet spot between traditional stereophonic recordings and the extreme of fully positional 3D audio.
If this sounds like an ambitious project, that’s because it is. There are obviously a lot of variables for Apple to consider here that would go beyond what an iPhone is capable of capturing, and the algorithms that would be necessary to create a 3D soundstage would have a complexity that’s on par with some of the most sophisticated voice recognition systems. However, Apple’s machine learning engineers have already worked some pretty amazing magic with the iPhone cameras, so it’s not an insurmountable problem by any means.
Of course, as with all of the patents we see from Apple, the mere existence of an application for a patent on an idea doesn’t mean that we’ll ever see it in an actual product, but it does provide some interesting insight into some of the things that Apple is thinking about, and it definitely seems in line with many of the company’s current ambitions.