cancel
Showing results for 
Search instead for 
Did you mean: 

Audio Ideas and Feedback

ajocular
Honored Guest
Lots of good detail in this talk. There's one point I'd like to dig deeper on:

"Keep your audio sources slow, calm, and don't go crazy spatializing everything like we did when colored lighting first came about."

I'm not on board with that approach. My impression is that these recommendations are only due to processor limitations. The aesthetic results do not benefit from these limits. I don't think it's possible to bombard the end user with too much spatial audio. In real life, we are constantly immersed. Our brains practice tuning dozens of spatial sounds in and out within a field moment by moment. The closer we get to that real-life scenario of dozens of spatial sources, the closer we'll get to a sense of audio presence. Spatialize all of it if possible, and the human brain will sort it out, as it does in real life.

Also, not all sound sources naturally move slowly, and I want to spatialize the really fast ones too.

The Morpheus team is pushing the envelope here. I just spoke with their audio guy at GDC (sorry I can't remember your name, dude), and he told me they are able to spatialize 64 sources simultaneously without a performance hiccup. They can get away with that because apparently they have the beefiest APU I've ever heard of. Everybody else is sort of SOL on this front because CPU cores will always be hogged by graphics. No matter how many cores are available, audio will always get the short end of the stick.

Lots of 3D sound instances that are moving quickly will chew up the CPU, kill the frame rate, and create artefacts. We don't have very many people talking about how to optimize audio performance in order to increase the number of available 3D sound instances. Carmack often talks about graphics optimizing toward fidelity when for VR we need graphics optimized toward performance instead. That's equally true for audio. Most teams are optimizing toward fidelity. High fidelity is great, but what's the lowest fidelity that also sounds great? The answer to that question will enable us to have more simultaneous 3D sound sources without killing frame rate or causing artefacts. I believe that will yield a much more immersive sound field at the end of the day.
13 REPLIES 13

Anonymous
Not applicable
Yeah, I think that part was unclear and we didn't have time to refine the message in a succinct fashion.

The first issue is that in our current implementation we're snapping to the nearest position every time we process a buffer. This means, for example, if we're processing 512 sample buffers and using a single HRTF at that point that we're going to have discontinuities between buffers at ~10ms intervals (depending on sample rate). A fast moving object may traverse multiple HRTFs in that span of time (depends on speed of object and size of buffers). There are workarounds for this that we'll be implementing, but it was worth mentioning.

The count doesn't actually have to do with performance (we can handle 64 voices very well :D), but more about making a busy mix. If you want people to really notice spatialization then you can't drown them with lots of spatialized sources. This is true of real life -- the more noise, the harder it is to pinpoint an individual source.

saviornt
Protege
I understand that the audio is spatialized, however, is there a method for audio "raytracing" based on PBR materials? For example, a sound shouldn't bounce off a soft surface as easily as it would a hard surface.
Current WIPs using Unreal Engine 4: Agrona - Tales of an Era: Medieval Fantasy MORPG

chris_pike-bbc
Honored Guest
Hi Brian,

It's surprising to hear that you're processing in this way. It's quite common to crossfade between the outputs of convolution with the current and previous HRTFs over the course of a buffer to avoid these discontinuities. Even with a fairly slow moving source you could get clicks/artefacts from your current approach.

In a sense I agree with you about creating a busy mix, if you want to draw focus to a particular object then you don't want too many sound sources masking it. But this is the same with normal stereo production. Actually in my experience if you mix stereo and spatialised sound sources you get some spatial masking which reduces the effect of spatialisation.

Anonymous
Not applicable
"chris.pike-bbc" wrote:
Hi Brian,

It's surprising to hear that you're processing in this way. It's quite common to crossfade between the outputs of convolution with the current and previous HRTFs over the course of a buffer to avoid these discontinuities. Even with a fairly slow moving source you could get clicks/artefacts from your current approach.

In a sense I agree with you about creating a busy mix, if you want to draw focus to a particular object then you don't want too many sound sources masking it. But this is the same with normal stereo production. Actually in my experience if you mix stereo and spatialised sound sources you get some spatial masking which reduces the effect of spatialisation.


yep, I would agree with that.

You can't hope to only spatialise some aspects of the audio, you literally have to spatialise each and every track to ensure you are not flooding the three dimensional image.

What I am working on is a way to virtualise a stereo headphone pair within the three dimensional space, that way you can still add other elements within the space without ruining the effect.

Anonymous
Not applicable
"chris.pike-bbc" wrote:
Hi Brian,

It's surprising to hear that you're processing in this way. It's quite common to crossfade between the outputs of convolution with the current and previous HRTFs over the course of a buffer to avoid these discontinuities. Even with a fairly slow moving source you could get clicks/artefacts from your current approach.


We are doing that, but with fast enough sources the crossfade still may not be enough (and crossfades introduce their own artifacts, e.g. smearing) and you'll still get jumps between positions. We plan on doing more interpolation or extrapolation within the spatializer so we can get more discrete positions for fast moving sounds, but that comes at more CPU cost.

saviornt
Protege
Any thoughts on the ray-tracing idea? The wave dynamic maths are out there, and if you include a "material type" as a public variable, it could use that variable to calculate the audio reflectiveness for different materials. So I guess there are three questions:

- Potential CPU cost for calculation?
- Difficulty in implementation?
- Would it be worth it?
Current WIPs using Unreal Engine 4: Agrona - Tales of an Era: Medieval Fantasy MORPG

Anonymous
Not applicable
saviornt, that's beyond the scope of our SDK, because it integrates too closely with the upstream middleware and engine (we don't have access to geometry or material information). Other companies are definitely working on this.

saviornt
Protege
Gotcha, yea, it does make sense that it wouldn't be in the Oculus SDK now that I think of it since it would be highly engine dependent.

I found an interesting site if anyone wants to do some late night reading:

http://www.ondacorp.com/tecref_acoustictable.shtml
Current WIPs using Unreal Engine 4: Agrona - Tales of an Era: Medieval Fantasy MORPG

Anonymous
Not applicable
I've also been interested in the audio ray-tracing idea so I'm hoping that FMOD and UE4 will support this in the future. It makes sense for every virtual object to have physics properties assigned to it. In the case of 3D sound, values relating to an object's absorption, reflectiveness, and resonance could add to the accuracy of the early reflections. Most of the time there would be a level decrease and specific low-pass filter for each bounce before applying the appropriate HRTF.

If the system were scalable most users could utilize their CPU to calculate a few early reflections on the most prominent objects around the listener (walls, floor, ceiling, major furniture, etc.). The late reflections could be general approximations which could then be cross-faded with a reverb tail which takes into account the room's overall acoustic properties. A higher end system might use a dedicated Physics & Audio card (using CUDA or OpenCL) to do any highly parallel processing, and thus, calculate more objects to a greater reflection depth.

None of these ideas are new, but I think that with the advent of VR they can finally be utilized and appreciated more fully. True presence in VR demands accurate physics and true 3D sound.