While the software that performs the computation necessary for immersive audio such as HRTF filtering already exists (for instance this), this is limited by the lack of knowledge of the users head related transfer function (HRTF).
Here are some solutions to this problem:
1. Use some standard HRTF which represents average head shape. 2. Measure the HRTF of the user empirically. 3. Select a HRTF of a subject from an existing database, whose anthropometric profile (head size, pinna shape ...) best matches the user. 4. Based on user's anthropometric data do fancy math to obtain the HRTF
Discussion: 1 is most common but worst solution in the case when the average head doesn't fit the user. This lack of fit mostly leads to a front-back confusion. 2 is the best option but unfeasable. It requires special equipment which only few labs around the globe do posses. 3 has been discussed previously in this thread. Also MS is using this option for their VR (can't find the link, sorry). 4 is hardest to achieve, but unlike 3 it doesn't require external HRTF data set
While I'm still waiting for the linux SDK, I took a look at some simple models that go towards 4. Listen to the demos and tell me what you think.
I also plan to look at some models that incorporate the pinna profile and are thus able to create elevation cues. I also saw some recent open HRTF databases which may be useful for exploring 3.