cancel
Showing results for 
Search instead for 
Did you mean: 

Ambisonic Microphones

ajocular
Honored Guest
I'd love to hear people weigh in on ambisonics. I'm looking to pick up an ambisonic mic pretty soon, but I'd like to know who else is using them for VR already and what you think about them vs. other gear.

My background with 3D audio is more on the omni-binaural side, but lately I've been feeling like I want to capture the sound field neutrally in addition to binaurally. Ambisonic mics make it easier to isolate elements from the sound field and play with them in post.

With binaural, you want to leave the track alone as much as possible, but maybe sometimes I want more verb or a flanger or whatever, and maybe I want it on one object but not the rest of the field.

Anybody have recommendations for the best ambisonic mics out there?

My primary concern is this: if ambisonic mixes aren't panned correctly, they can really gum up spatialization in post. Ambisonics are excellent at isolating unique elements of the sound field, but they're not as good at direction cues as binaural, so I think a lot of people will try to mix the two, and that's where you run into trouble. There are ways to avoid the pitfalls, and I hope we as a community can all help each other get up to speed on the strengths and weaknesses.

I'm worried that if ambisonics become super popular in VR, it will lead to blurry spatialization in lots of VR experiences (unless we heed the pitfalls). Ambisonics are a fascinating concept - you can position the sound anywhere within the volume of a sphere around your head. In VR though, we'll want to stay on the surface of that sphere. You have to crank your X,Y, and Z channels all the way to one side (whichever side you want to isolate).

That limits the versatility of the format because it prevents the mic from providing any of the distance cues it is famous for. It's the graphics equivalent of only being allowed to render at infinite. However, it's the only way to get a binaural plugin properly into the mix later in the chain. Those distance cues are virtually simulated by the hardware anyway, so you may as well virtually simulate them using any number of other methods later in the chain, and your results should be just as good.

Anyone have experience with this? It's hard to find people who are breaking ground here.
21 REPLIES 21

OlivierJT
Explorer
Hello Ajocular,

So the thing with these microphone is how do you render them you your VR experience.
They make sense for a static head similar to today surround sound systems.
> Move your head and the sounds is not related to it.

From my experience so far : you need Mono audio sounds...
No need for complex mics and binaural things.
Why ?
The engine will do all that for you :
-evaluating the sound in relation to you
-and making it sound right (binaural)

On today engine (UE4 is the one I know), to get sound spacialization you need : Mono audio.
If it's stereo the Engine will render it without panelisation. It will be attached to your head.

I have been waiting and shouting for the importance of 3d audio since I began my work in VR (+2 years ago) and we are close, Oculus should release an Audio SDK very very soon.
From then we'll be able to figure it out exactly...

It's possible today to get Wwise and Fmod with a 3d audio plugin, but I haven't tried them (making content with it I mean), as Oculus Audio SDK is nearly there, I can't waste my time on things that may end up not Oculus compatible...

Don't invest in complicated microphone yet... Mono audio is very probably all you need !
And a good stereo headphone of course.

Anonymous
Not applicable
Olivier, you're correct that mono is the way to go for spatialization (especially with head tracking), however there is still the issue of live VR capture (not CGI) and how to capture (and replace!) those sounds for later head tracked spatialization.

kibibu
Honored Guest
Regarding capturing surround environments, there's an approachable but short paper available here:
http://www.adrianofarina.it/Files/paper ... a_2012.pdf

which uses the MH Acoustics "Eigenmike" (http://www.mhacoustics.com/products)

Personally, I'm not convinced Ambisonics are high-enough order to really capture enough detail to allow good localization
(in fact, this paper suggests very strongly that you need better than Ambisonics-level http://www.ncbi.nlm.nih.gov/pubmed/23654379).

studio13
Honored Guest
"ajocular" wrote:
I'd love to hear people weigh in on ambisonics. I'm looking to pick up an ambisonic mic pretty soon, but I'd like to know who else is using them for VR already and what you think about them vs. other gear.


Sorry no practical working experience with the Soundfiled microphone (or other ambi-systems) but experience delivering surround for broadcast and games. I'm interested in what you intend to record using an ambisonic mic' and how you would play that recording back in the context of a game engine.

ajocular
Honored Guest
It seems to me that the difference between an A-format mic and something like the Eigen is a lot like the difference between a 360 camera that has 14 lenses vs. one that has 30. As long as there's 100% overlap across the field, then it seems to me like it shouldn't be that big a deal how many mics you use because you can go higher quality, bulkier, and fewer units, or you can go lower quality, smaller, and more units and end up with similar results across the board. Please correct me if it's not fair to assume audio hardware works the same as video in this regard, and please give us some details on that.

No matter how many mics you include in the array, as long as you have enough to get the whole field with no dead zones (which I think A-format manufacturers would argue is exactly what they do) then the rest of the mics feel like overkill to some of us who aren't hardware gurus. We can't know for ourselves what subjective difference having more mics will make unless we've A/Bed the same field with both an A-format and an Eigen (an experiment I'd very much like to try).

I imagine the difference in quality would be noticeable, but I find it hard to believe it would create that essence of "undeniably twice as good," which is exactly the essence I get from binaural processing. Full disclosure, I'm obsessed with binaural cues, so much so that I wrote one of the few 3D audio Unity plugins available on the Asset Store, but I get that essence from almost every binaural plugin, not just mine.

My guess is that going from an A-format mike to an Eigen would give me an essence of stepping the quality up from 95% to 100%. It's in that realm of diminishing returns for the amount of extra hardware needed. I haven't heard an A/B of the same field from one to the other, but I have heard recordings from both and I don't think I'd be able to tell which is which every time in a double-blind test.

However, price and convenience weigh into as well. Converting A-format to B-format and then hard-panning everything to prep it for binaural processing is a pain, so if the Eigen and others have a simpler process, that's important info for us to know. I don't know how much the Eigen costs, but if it's more than 25% higher than a good A-format mic, that's beyond the realm of "worth it" in my book (unless I'm contracting with someone for whom price is no object) to eek out that last ounce of quality.

I think the problem is that no matter how high the order is, the goal is accurate spatialization, and binaural processing just takes the ball so much farther down the field than any manipulation of the order. Thus, it's way more important for me to get the binaural cues correct rather than worrying too much about whether the field I captured was a perfect sphere. Sure, I want it to be perfect, but at what cost? Without a limitless budget, it strikes me as a nice-to-have, not a must-have.

That's especially true because, at the end of the day, the field recording is really just going to be there for an ambience bed. We're all just scratching the surface of pro audio for VR, so most people in this space are looking at this whole-field capture thing as an "all-you-need" solution, but foley and ADR will be coming in to sweeten the deal in a big way very soon.

We don't have a set of software tools for it yet, so game devs are the only ones who can really take advantage this early on, but I have no doubt someone out there is building a "Final Cut of VR" app that will allow non-developers to drop foley and ADR in anywhere and synch it to an object's position on the sphere. At that point, the challenge inverts. It'll no longer be "which mic is best at isolating any spot in the sound field and bringing it out?" The challenge will become "which mic can isolate any sound in the field and get rid of it so that we can replace it with foley and/or ADR?"

I think it's going to become standard practice in VR to do an ambience pass with no dialog in every scene (I've done that on one shoot so far), and then ADR almost EVERYTHING in post. Film makers the world over are seeing the words "ADR everything" and cringing. 🙂 If you're a film maker reading this, trust me. This bullet can't be avoided. Any other way will result in noticeably lower quality audio. No matter how good these full-field mics are at isolating, they'll never be able to pull a sound completely out of the mix, so if the sound is in the field at all, then it'll be near impossible to dub foley or ADR on top of it.

the5souls
Protege
I love the informative post, ajocular. I just graduated from college with an unrelated degree, but I am very interested in the audio aspects of gaming. Games just never had that... immersive audio I've been searching for. The closest was probably the newer Battlefield and Battlefield: Bad Company games.

ajocular
Honored Guest
Happy to share knowledge. We need as many people as possible to be up to speed because the challenges in doing audio well in VR are similar to those on the graphics side. The bar is much higher in VR than anything we've seen before.

There actually have been people working hard on 3D audio in gaming since the 90s (university research dates all the way back to the 70s). Ambisonic mics actually predate practical HRTFs, but they were originally intended as an alternative to surround sound in film, whereas HRTFs found a home in the PC gaming boom in the 90s.

The trouble has been (aside from IP battles between the big players) that most 3D audio rendering methods are performance hogs even by today's standards. To render a sound in 3D, you have to interpolate three primary components of a signal: interaural time difference (ITD), iteraural level difference (ILD), and spectral color (EQ variations across all bands caused by anatomy). Doing all that interpolation multiple times per frame (every time the physics engine updates) for multiple sound sources at once definitely moves the latency needle even for a souped-up gaming rig.

The audio people on AAA projects sometimes really want to render everything in 3D, but whichever plugin they choose usually gets the axe from the designer or the project manager in order to meet performance benchmarks. 3D audio has always been viewed by the powers that be as this nice-to-have-but-not-essential thing, but VR is flipping the script on that perspective (we hope).

kibibu
Honored Guest
> A-format mic and something like the Eigen is a lot like the difference between a 360 camera that has 14 lenses vs. one that has 30

Ok, now imagine that each camera only has a single blurry pixel and you're closer to how a microphone captures sound. In this case, you are much better with 30 than 14 (as long as each microphone has a tight-enough directional pattern).

ajocular
Honored Guest
I'm not sure I follow that update to the comparison. Do you mean that the difference in the number of mics has a much larger effect on recognizable fidelity than I implied with the original comparison?

If so, it's hard for me to get on board with the extremity you're describing. Do you not agree that's an impossible exaggeration for audio? Or was it intended as hyperbolic?

The extra fidelity immediately visible to a layperson's naked eye would obviously be recognized as more than twice as good from 14 to 30 pixels. On the other hand, I have heard samples from both mics, and the signal certainly does not become 2X better when you switch from one to the other. I don't hear significant quality differentiation, and I think if double-blind tests were done with laypeople, the numbers would follow suit with my perspective.

Even if the quality were noticeably twice as good, I still don't agree with the premise of optimizing adjacent isolation. The mic patterns on the Eigen would obviously enable better adjacent isolation compared to A-format. I won't argue against that, but it seems to me that's not important for VR because spatializing a signal that has a little adjacent bleed versus one that has a lot of adjacent bleed will yield results with a negligible difference in spatial fidelity post-HRTF.

The spatial blurriness I mentioned in a previous post comes from ILD and ITD discrepancies which only exist across opposite sides of the sound field. In other words, bleed from one mic to another doesn't matter much if the mics are next to each other because the HRTF takes over and sharpens the spatialization to a controllable point in either case. Bleed from one mic to the one on exactly the opposite side of the field IS a problem, but the remedy for that is correct mixing for both A-format and Eigen. Pan hard to the correct side of the field, and the problem is gone. It's gone for A-format. It's gone for Eigen. Results would be very similar either way as long as spatial filters are applied, and failing to apply spatial filters is never correct in VR except for underscore (which only exists outside the field anyway).

If there are A/B samples of A-format versus Eigen (with HRTFs applied) that can prove me wrong, I want to hear them. I don't want to shoot down any hardware undeservedly. I want everyone to have a fair shot, but let's have every manufacturer bring out the biggest guns they've got, and let's hear everything apples to apples.