cancel
Showing results for 
Search instead for 
Did you mean: 

Partially/fully pre-rendered ray tracing...

msat
Honored Guest
I'm sure you're all aware of the image quality that can be had from ray tracing. While pulling it off with impressive results in real-time is becoming more of a reality, it's still some ways away, and even then it's not quite on par with the quality we see in big budget movies today. But what if we could view pre-rendered scenes with limited 6DOF and in 3D? Well, that's the whole purpose of this post! 😄

I have been interested in the concept of pre-rendered still and animated stereoscopic "panoramas" for the past several months now, and have been thinking about how it could be accomplished. Well, I figured I would share what has been on my mind. It's my understanding there are no implementations of panorama viewers that allow for both, 6DOF and stereoscopic viewing. As far as I know, the method I'm going to describe has not been done before, and while I unfortunately don't have the skills to implement an example myself, I hope someone might find these thoughts interesting and useful enough to give it a shot. 🙂

Off the top of my head, some of the applications where this could be useful is something as simple as sitting on an extremely detailed beach, to interactive media with limited or no real-time dynamic visuals such as certain types of adventure games, to being a "fly on the wall" in a Pixar movie. The primary benefit of this is the quality of visuals that can be achieved without having to do it in real-time. Well, at least not having to render the entire scene in real-time.

The most appropriate and descriptive name for the method that I can think of is 'light-field cube ray mapping'. Maybe that sounds like nonsense but bear with me for a moment. Lets say you wish to view a scene from the vantage point of a person sitting on a stool in the middle of a CG room. Now imagine enclosing that person's head in a virtual glass box that's big enough to allow for comfortable but limited head movement in all directions (rotation and position). This virtual box will form the basis of both the light-field cube camera during the pre-rendering of the scene, as well the area we will need to perform ray lookups during run-time.


There's no specific way pre-rendering the scene and capturing the data to the light-field cube needs to be performed, but I'll describe the way I had in mind. Each face of the cube contains a finite array of elements somewhat similar to pixels, but instead of recording just a single color value, it captures the angular data of the light rays entering it as well as their color information. Each element must likely be able to "capture" more than one ray (though in practice sometimes it may capture none). What you end up with is 6 faces of the cube (you don't necessarily have to do all 6 faces) with all the various light rays that entered it during pre-rendering mapped along the array of surface elements (ray maps). I just want to point out that one consequence of this approach is that the ray tracing engine for the pre-render phase would need to start from the light source, rather than the common method of starting from the camera and following it to the source in reverse. As you can probably imagine, capturing video would essentially create constantly changing ray maps.


In order to view the scene at run-time, we start with a typical ray tracer, where a ray extends outwards from the camera viewport, but it only intersects a single object - an element on the inside surface of the cube - and performs a lookup for a ray of the matching angle. The performance of this method will heavily depend on the efficiency of the lookup algorithm, but an optimized system should be substantially faster than a typical ray tracer for a given detail level. Of course, the drawback is that dynamically drawn elements are pretty much impossible unless you also incorporate aspects of traditional 3D engine which you could use for the purpose of rendering certain elements in real time.

You can take this concept a step further and fill an entire scene (or at least the areas the viewport can be) with the light-field cube cameras, and traverse from one light-field cube to the next in real time. This would also make rendering dynamic elements more feasible.

For content that wouldn't be affected by these limitations, the visual quality it could produce at a given performance level might be hard to achieve any other way.
25 REPLIES 25

mrboggieman
Honored Guest
Many thanks for the reply but could you expand on baking the eye offset, technically speaking.

If you use cameras, one for each eye, how do you keep the separating distance between the eyes consistent whilst looking around - I assume you attach them to be fixed distance and then pivot them around that point - doesn't this only really keep the separation distance for the middle pixel in the result image for the direction you are facing.

I have thought about ray tracing this way and use sphere maps instead of cube maps but there would be too much distortion.

There are some 360 stereo demos in this thread: viewtopic.php?f=28&t=5285 but they don't feel right when I view them in the oculus - I don't know if this is because there is a time delay for each eye and so the result imagery is different and makes you feel sick or if it is the varying separation distance as you look around.

geekmaster
Protege
"mrboggieman" wrote:
Many thanks for the reply but could you expand on baking the eye offset, technically speaking.

If you use cameras, one for each eye, how do you keep the separating distance between the eyes consistent whilst looking around - I assume you attach them to be fixed distance and then pivot them around that point - doesn't this only really keep the separation distance for the middle pixel in the result image for the direction you are facing.

I have thought about ray tracing this way and use sphere maps instead of cube maps but there would be too much distortion.

There are some 360 stereo demos in this thread: viewtopic.php?f=28&t=5285 but they don't feel right when I view them in the oculus - I don't know if this is because there is a time delay for each eye and so the result imagery is different and makes you feel sick or if it is the varying separation distance as you look around.

You mount two cameras with fisheye lenses side-by-side, with a vertical pole between them. Rotate the pole while taking perhaps 50 pairs of photos. Use the center strip from each photo, combining them into a stereoscopic pair of panorama photos. After converting to skyboxes, no matter which way you look, the left eye sees the skybox with a left-camera offset (no matter which direction you look), and the right eye sees the skybox with a right-camera offset.

You CAN get by with a single camera (offset on the pole), but you need to dewarp a curved strip from the right edge of a photo where the camera was looking left, and a curved strip from the left edge of a photo where the camera was looking right. Easier to use two cameras.

You are only using a single narrow strip from each photo, warped slightly to blend with neighboring strips. The fisheye lens is to give a full 180-degree vertical FoV.

This method has been discussed in threads here and at MTBS3D, and the interwebz have plenty of info (with photos and diagrams) detailing this method. You can also buy a motorized tripod that will automatically rotate a pair of cameras and take all the photos you need.

My previous posts(s) in this thread contained links to pages that had more links. Here is one you may be interested in:
3D Panorama blog, twin camera configurations:
http://www.stereopanoramas.com/blog/twin-camera-configurations-options/
Pay special attention to his "No Parallax Point" comments at that web page:
So the equidistant concept is bound to produce stitching issues. This parallax problem with stereo rigs is the explanation why stereo panos with twin rigs require lots of shots for each camera. More shots means the errors are reduced for adjacent pairs.

An alternative arrangement is to have one camera rotating on axis, in a NPP fashion. And the other image is rotating with a greater parallax error than before . So the first camera can have perfect stitching and the second camera will require more shots (maybe twice as many) in a sequence to get the same quality of stitching as in an equidistant setup. Here is an example of this kind of rig.

Between Peter Murphy (mediavr) and Paul Bourke, you have just about everything you need to know about this subject. This thread discusses rendering stereoscopic 3D virtual panoramas (and more):
viewtopic.php?t=3465&p=48351#p46480




And Peter Murphy is shooting 180-degree stereoscopic video too:
viewtopic.php?f=33&t=5622&p=77895#p77807

mrboggieman
Honored Guest
Thanks a lot for all of the info. I had considered doing the same thing with ray tracing - imagine a normal sphere map being generated by projecting a ray from the center of the sphere outwards in the direction you are facing. The stereo ray traced sphere map would be ray traced in the same manner except that whilst the direction changed, the position of the camera would also change - being rotated around a point between the two virtual cameras. This would happen per eye and 'bake' in the eye separation.

However, I have a strong feeling that this would only produce correct results for the center of the viewport, as the surrounding area will be sampled at different views. If you move your eyes in the oculus, without moving your head, your 'direction' will not change but the texels you see were sampled for a different direction and so would look warped.

I think this approach looks more promising and is something that can be easily pre-rendered: http://www.marries.nl/graphics-research/stereoscopic-3d/

Think I may need to jump threads.

geekmaster
Protege
You are correct in assuming that head position changes (affine translation) will cause distortion when using a pair of cube maps (or sphere maps). But adding a depth map to the mix would allow some correction for such distortion, at the expense of artifacts near edges of objects when you have to synthesize pixel data to fill in any gaps.

You have the same capabilities and limitations as with 3D movies and 3D TV, except the virtual viewing screen is wrapped completely around you.

To allow for head roll (sideways tilting), you need an extra cube or sphere map taken from a (physical or virtual) camera with vertical offset, to allow compensation for vertical parallax. For best flexibility, you also need a depth map, as mentioned above (four maps total). There are many sources of information about this on the net, which Google can help you find.

Cube maps are easier to process than sphere maps, with built-in support in DirectX and OpenGL.

mrboggieman
Honored Guest
I have decided to implement a new approach that utilises a single baked cube map and depth map of a ray traced scene (can easily be an animated scene too) and use this information to render in stereo.

The way it is going to work is by using relief-mapping on the walls of the cube map and using the depth maps during the path tracing process to decide when the rays from each eye intersects with the geometry. To cope with hidden front faces, a dual depth peeling approach may be used during the capture of the ray traced scene to extract the layers of the geometry of the ray traced scene so no geometry info will be lost - essentially flattened the rendering onto a cube map whilst preserving the depth for the entire scene.

I have implemented a relief mapping demo before so I believe what I am suggesting above is doable. You can play my relief mapping demo here: http://www.aplweb.co.uk/experiments/relief_mapping.html. It only traces one path and does not use any optimizing structures like cone maps or quad trees.

I will provide a link to the pre-rendered ray tracing demo if I can get it working...

geekmaster
Protege
With a single cube map, moving your head sideways or up and down will show artifacts near object edges, when looking a bit behind a foreground object. As I mentioned before, it takes 3 cubemaps to prevent such artifacts, although I have considered using a warped map that combines those 3 maps into a single set of images, with previously occluded pixels squeezed into this combined image (to be excluded for each eye based on their viewpoints). That would probably need an extra displacement map to indicate the position of sometimes-occluded pixels.

However, within certain constraints (minimal positional head translation, no nearby foreground objects, some distortion near object edges), a single (linear) cube map and a depth map could do the job.