cancel
Showing results for 
Search instead for 
Did you mean: 

3D Audio for 360 video workflow

Anonymous
Not applicable
I'm looking for the easiest way to create and publish 360 videos with 3D spatialized audio on the Rift, with an emphasis on 3D music authoring.

After reading thru the forums, looks like that entails putting spherical video into a game engine such as Unity/Unreal, using the Oculus Audio SDK + FMOD/WWise for incorporating 3D audio and publishing it via the Oculus Share / Store. Would that be the recommended / preferred workflow?
86 REPLIES 86

Petroza
Level 7
That's great, I'm glad to hear you got it working.

A quick update on this, we got it working with H.264 instead of VP8.

Here are the instructions to convert a MP4 to a format compatible with Oculus Video. To make a 360 video with Ambisonics that is compatible with Oculus Video, you need to transcode the MP4 (with AAC audio) to MKV (with Vorbis audio).

Download FFmpeg: https://ffmpeg.zeranoe.com/builds (Any build will work, I recommend the “64-bit Static” version)

Extract FFmpeg to a directory, e.g. C:\Tools\ffmpeg (there should be ffmpeg.exe inside the “bin” directory)

Then open a command prompt and use ffmpeg.exe to transcode the video file. First you need to strip the audio, then transcode to MKV.

[ Note: you have to use the full path to the file to use the .exe, so for this example it would be C:\Tools\ffmpeg\ffmpeg-20160318-git-a7b8a6e-win64-static\bin\ffmpeg.exe ]

1. Remove any audio in the MP4 (often video cameras capture mono/stereo themselves so we need to get rid of that)

ffmpeg.exe -i my_video_file.mp4 -c:v copy -an my_video_file_SILENT.mp4


2. a) Prepare the Ambisonics for ffmpeg by converting to .wav (ffmpeg doesn’t recognize the .amb file format)
b) Add the audio to the video and convert to MKV

ffmpeg.exe -i my_video_file_SILENT.mp4 -i my_ambisonic_audio.wav -c:v copy -c:a libvorbis -q:a 10 my_ambisonic_video.mkv

henkSPOOK
Level 3
This is really great! Thank you all who worked on this.

I find that in many productions, apart from spatial audio, there is also the need for a normal stereo track for non-positional audio (background music, voice over, low frequency sfx, etc). Is this something that can also be included?

thanks,

Henk

Petroza
Level 7
Non-spatialized audio is not currently implemented. What you can do as a work around is encode it into the ambisonics, this has the added benefit that it will be positioned in the world instead of being locked to the head. The way to do this is to convert your L/R stereo into mid-side format, which is essentially a 1D ambisonic.

To convert L/R to mid-side (in Audacity)
1. Open the stereo track and reduce gain by 6dB [ Effect > Amplify > -6 ]
2. Split the stereo into two mono tracks [click black triangle and select "split stereo to mono"]
3. Mix together into a new mono track. [Track > Mix and Render to New Track]
4. Select right channel (track 2) and invert the phase [ Effect > Invert ]
5. Select tracks 1 and 2 and mix together into a new track again (this is the "side" track)

Tracks 3 and 4 are a mid-side representation of the stereo recording.

6. Select "mid" track (3) and reduce gain by 3dB [ Effect > Amplify > -3 ] This is to match the ambisonic spec that the W channel is 3dB quieter than other channels.

Mix the "mid" track in with your ambisonic W channel.
Mix the "side" track in with your ambisonic Y channel.

tumbleandyaw
Level 2
Hello Peter,

Thanks for the update re. H264 video, and the M/S tip,  I'll try that later today; in the mean time a couple of questions for you:

- The ambisonics audio is working fine, but I'm having trouble with 5.1 audio playback, it states "this video cannot be be played", although I'm using the same settings for both (VP8 video with OggVorbis audio), is this a bug?

- Can we upload our videos to our own webserver and have the Oculus Video app read the URL to it, and then stream it, or is it local files only, for now?

Thanks !




ntkeep
Level 3
Nice! Great stuff Pete!

We've had quite a few people using the Spatial Workstation who wanted to help getting our .tbe format into b-format and into the Oculus player. Our encoder now supports b-format output too.

Matthew on our team put up a post with step-by-step instructions for iFFmpeg and ffmpeg -- for those who are new to the terminal or encoding apps. It will be of use for whatever tools/workflow you use.

Varun
TBE

tumbleandyaw
Level 2
Hello Peter,


M/S conversion works, but sums to mono at 90 and 270º. Thanks for the tip.

FOA ogg vorbis with either H264 or VP8 sounds great !.

However, 5.1 audio does not work on my end. All kinds of dropouts and weirdness happening (with H264) or "file cannot be played" (with VP8 video) . Is this just me or is it a known issue?

Can you share the specific codec settings for 5.1? I have it set for Ogg Vorbis pass thru.

Happy Easter!

henkSPOOK
Level 3


Non-spatialized audio is not currently implemented. What you can do as a work around is encode it into the ambisonics, this has the added benefit that it will be positioned in the world instead of being locked to the head. The way to do this is to convert your L/R stereo into mid-side format, which is essentially a 1D ambisonic.

To convert L/R to mid-side (in Audacity)
1. Open the stereo track and reduce gain by 6dB [ Effect > Amplify > -6 ]
2. Split the stereo into two mono tracks [click black triangle and select "split stereo to mono"]
3. Mix together into a new mono track. [Track > Mix and Render to New Track]
4. Select right channel (track 2) and invert the phase [ Effect > Invert ]
5. Select tracks 1 and 2 and mix together into a new track again (this is the "side" track)

Tracks 3 and 4 are a mid-side representation of the stereo recording.

6. Select "mid" track (3) and reduce gain by 3dB [ Effect > Amplify > -3 ] This is to match the ambisonic spec that the W channel is 3dB quieter than other channels.

Mix the "mid" track in with your ambisonic W channel.
Mix the "side" track in with your ambisonic Y channel.


Hi Peter,

Thanks for the suggestion. I have tried this technique before and as Albert says it sums to mono when turning your head to 90 and 270 degrees. Adjusting phase relations on the s channel and also sending this to the x channel gives a bit more even spreading of the side information when turning your head but this is all less than ideal. There is so much good sounding stereo audio in existence and using this technique to include it in the spatial audio mix is a little cruel 🙂

Voice overs, low frequency sound effects, (background) music and other non-diegetic sounds would really benefit from a non head-tracked stereo audio channel. So I was thinking, at some point, it would be nice to implement it so that a 6 channel audio file can have ch1-4 as b-format ambisonics and ch5-6 as non positional stereo.

thanks,

Henk

Sassota
Level 3


Hello Peter,


M/S conversion works, but sums to mono at 90 and 270º. Thanks for the tip.

FOA ogg vorbis with either H264 or VP8 sounds great !.

However, 5.1 audio does not work on my end. All kinds of dropouts and weirdness happening (with H264) or "file cannot be played" (with VP8 video) . Is this just me or is it a known issue?

Can you share the specific codec settings for 5.1? I have it set for Ogg Vorbis pass thru.

Happy Easter!

5.1 audio sounds fine and plays well, but the transitions from one surround to another sounds strange. here is a link with an Ambisonic file decoded to 5.1 (It´s a audio that complete a 360 cycle in about 10 seconds), and some video files, one has ambisonic sound and the others 5.1.

https://www.dropbox.com/s/hrnyq0zvazd3yde/balltestvrplayer.zip?dl=0

tumbleandyaw
Level 2
Sassota said:
5.1 audio sounds fine and plays well, but the transitions from one surround to another sounds strange. here is a link with an Ambisonic file decoded to 5.1 (It´s a audio that complete a 360 cycle in about 10 seconds), and some video files, one has ambisonic sound and the others 5.1.

https://www.dropbox.com/s/hrnyq0zvazd3yde/balltestvrplayer.zip?dl=0



Thanks for putting that up, Sassota !, Yes, your files work fine in 5.1, and mine don't..... 😞

Here are the FFmpeg settings for both Sassota's and my file, my bitrate is a lot higher; is that the issue here or something else?

Any best case settings anyone cares to share?

My file:

General
Unique ID : 208163860557796261321707924932085132199 (0x9C9AE6B79530F2D7F83FFAACDA17ABA7)
DANY360_v4_Oculus5.1_360.mkv
Format : Matroska
Format version : Version 4 / Version 2
File size : 84.0 MiB
Duration : 2mn 50s
Overall bit rate mode : Variable
Overall bit rate : 4 140 Kbps
Writing application : Lavf57.25.100
Writing library : Lavf57.25.100
 
Video
ID : 1
Format : AVC
Format/Info : Advanced Video Codec
Format profile : High@L3.2
Format settings, CABAC : Yes
Format settings, ReFrames : 3 frames
Codec ID : V_MPEG4/ISO/AVC
Duration : 2mn 50s
Bit rate : 2 618 Kbps
Width : 1 440 pixels
Height : 720 pixels
Display aspect ratio : 2.000
Frame rate mode : Constant
Frame rate : 30.000 fps
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : Progressive
Bits/(Pixel*Frame) : 0.084
Stream size : 53.1 MiB (63%)
Writing library : x264 core 148 r2665M a01e339
Encoding settings : cabac=1 / ref=3 / deblock=1:0:0 / analyse=0x3:0 / me=hex / subme=6 / psy=1 / psy_rd=1.00:0.00 / mixed_ref=1 / me_range=16 / chroma_me=1 / trellis=1 / 8x8dct=1 / cqm=0 / deadzone=21,11 / fast_pskip=0 / chroma_qp_offset=-2 / threads=8 / lookahead_threads=1 / sliced_threads=0 / nr=0 / decimate=1 / interlaced=0 / bluray_compat=0 / constrained_intra=0 / bframes=3 / b_pyramid=1 / b_adapt=1 / b_bias=0 / direct=1 / weightb=1 / open_gop=0 / weightp=2 / keyint=60 / keyint_min=30 / scenecut=40 / intra_refresh=0 / rc_lookahead=40 / rc=crf / mbtree=1 / crf=20.0 / qcomp=0.60 / qpmin=3 / qpmax=51 / qpstep=4 / ip_ratio=1.40 / aq=1:1.00
Default : Yes
Forced : No
HANDLER_NAME : Core Media Video
DURATION : 00:02:50.100000000
 
Audio
ID : 2
Format : Vorbis
Format settings, Floor : 1 / 8710
Codec ID : A_VORBIS
Duration : 2mn 50s
Bit rate mode : Variable
Bit rate : 1 440 Kbps
Channel(s) : 6 channels
Sampling rate : 48.0 KHz
Compression mode : Lossy
Delay relative to video : 22ms
Stream size : 29.2 MiB (35%)
Writing library : libVorbis (Everywhere) (20100325 (Everywhere))
Default : No
Forced : No
DURATION : 00:02:50.125000000
 
Sassota's file:

General
Unique ID : 222854396547844684723747300207155481453 (0xA7A8320FE4358099E8A686CE53000B6D)
Complete name : ball51_360.mkv
Format : Matroska
Format version : Version 4 / Version 2
File size : 4.00 MiB
Duration : 33s 371ms
Overall bit rate mode : Variable
Overall bit rate : 1 005 Kbps
Writing application : Lavf57.29.100
Writing library : Lavf57.29.100
 
Video
ID : 1
Format : AVC
Format/Info : Advanced Video Codec
Format profile : High@L4
Format settings, CABAC : Yes
Format settings, ReFrames : 4 frames
Codec ID : V_MPEG4/ISO/AVC
Duration : 33s 367ms
Bit rate : 601 Kbps
Nominal bit rate : 750 Kbps
Width : 1 920 pixels
Height : 1 080 pixels
Display aspect ratio : 16:9
Frame rate mode : Constant
Frame rate : 23.976 fps
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : Progressive
Bits/(Pixel*Frame) : 0.012
Stream size : 2.39 MiB (60%)
Writing library : x264 core 148 r2665 a01e339
Encoding settings : cabac=1 / ref=3 / deblock=1:0:0 / analyse=0x3:0x113 / me=hex / subme=6 / psy=1 / psy_rd=1.00:0.00 / mixed_ref=1 / me_range=16 / chroma_me=1 / trellis=1 / 8x8dct=1 / cqm=0 / deadzone=21,11 / fast_pskip=1 / chroma_qp_offset=-2 / threads=8 / lookahead_threads=1 / sliced_threads=0 / nr=0 / decimate=1 / interlaced=0 / bluray_compat=0 / constrained_intra=0 / bframes=3 / b_pyramid=2 / b_adapt=1 / b_bias=0 / direct=1 / weightb=1 / open_gop=0 / weightp=2 / keyint=250 / keyint_min=25 / scenecut=40 / intra_refresh=0 / rc_lookahead=40 / rc=abr / mbtree=1 / bitrate=750 / ratetol=1.0 / qcomp=0.60 / qpmin=10 / qpmax=51 / qpstep=4 / ip_ratio=1.41 / aq=1:1.00
Default : Yes
Forced : No
DURATION : 00:00:33.370000000
 
Audio
ID : 2
Format : Vorbis
Format settings, Floor : 1 / 16900
Codec ID : A_VORBIS
Duration : 33s 371ms
Bit rate mode : Variable
Bit rate : 384 Kbps
Channel(s) : 6 channels
Sampling rate : 48.0 KHz
Compression mode : Lossy
Delay relative to video : -86ms
Stream size : 1.53 MiB (38%)
Writing application : Lavc57.28.103
Writing library : libVorbis (⛄⛄⛄⛄) (20150105 (⛄⛄⛄⛄))
Default : Yes
Forced : No
DURATION : 00:00:33.371000000
 




Sassota
Level 3
try another bitrate, AFAIK max OGG bitrate is 500 Kbps.