cancel
Showing results for 
Search instead for 
Did you mean: 

Oculus SubmitFrame & glutSwapBuffers

zhtet
Honored Guest
Hi,
I am seeing a weird behavior with some performance numbers and was wondering if anyone has a good explanation/or idea of what ovr_submitFrame is doing internally. So I have this flow (the blit's are copying pre-oculus fbo to screen, NOT the mirrored):

Draw (~8ms) > Blit (0.1ms) > SubmitFrame > glutSwapBuffers (1.0ms)

where swap buffers is taking 1ms if it's run after submitframe. The whole app time (pre-submitframe time - post-submitframe time) takes about 9.5ms. But if I do the following:

Draw (~8ms) > Blit (0.1ms) > glutSwapBuffers (0.1ms) > SubmitFrame

glutSwapBuffers only takes 0.1ms and the whole app time is only 8.5 ms. Why is running swapbuffers before submitframe faster than right after? Submitframe shouldn't have any impact on fbo0 does it? This is on windows 7, oculus sdk 1.3.0, cv1 & freeglut (glutSwapBuffers is just calling window's opengl swapbuffers function internally; checked the source)

Thanks
8 REPLIES 8

zhtet
Honored Guest
Oh one more point: wglSwapInterval is set to 0, so swapBuffers should not be blocking.

zhtet
Honored Guest
Does anyone have any thoughts on this? 🙂

I do see SubmitFrame calls NvAPI_D3D_CudaInteropFunction multiple times in NSight. Does that not play well with SwapBuffers?

Thanks

jherico
Adventurer
When you say "pre-oculus fbo", are you talking about a framebuffer constructed from the Oculus created texture and used to draw the scene?  If so, here's what I suggest... don't do that.  Create an entirely native OpenGL FBO (with a native OpenGL texture) to draw your scene, then blit that to the Oculus FBO and the default framebuffer.  The Oculus provided OpenGL textures are actually Direct3D textures using the D3D/OpenGL interop extensions.  Once you submit them to the runtime they're going to be handed off to a background thread which is going to lock them while the runtime uses D3D to run the compositor processing.  

Your Blit command to the screen is probably blocked from executing on the GPU because the source texture is locked for use in D3D, causing the delay in the swap buffer call which won't complete until all the commands have executed (even if you're not using v-sync, it's still roughly equivalent to a glFinish call).  

In other words, if you're using OpenGL, don't ever use the Oculus provided texture as anything but a target for a blit from another framebuffer where you rendered the scene and have full control over the texture.  

thewhiteambit
Adventurer
Thank god we have the API manage the texture and no capability of giving the API an external texture(set), everything is so much easier now!

jherico
Adventurer


Thank god we have the API manage the texture and no capability of giving the API an external texture(set), everything is so much easier now!


It is, actually, your sarcasm notwithstanding.  Sharing texture data with another process is non-trivial and requires synchronization to ensure that I'm not writing to a texture while you're reading from it and vice-versa.  If the API allowed you to provide your own textures it would end up being either less powerful and/or efficient (by not returning until it had copied the texture content to it's own surface) or more complicated (by including a signalling mechanism to allow a program to know when a texture it had provided was on longer in use by the runtime).  

But the texture provided by the runtime is really intended for exactly one purpose... a draw operation that puts the resulting scene into it.  Using it as a general purpose texture for both drawing and reading and expecting the same performance as a texture you created yourself is naive.  

jimbo00000
Explorer

jherico said:

Sharing texture data with another process is non-trivial


I didn't even realize it was possible until I saw it in the Oculus runtime. Do you have any idea how it's done?

galopin
Heroic Explorer
The lock should not be an issue whatever you do with the current given surface. The oculus runtime create swapchain with three buffers, it is enough to never have a bad lock. I guess they have more internally too because they have a dispatch with the soon to be display image and previous display to perform an overdrive ( a way to help the screen to go faster to the new color and reduce ghosting and motion blur ).

But it is true, they use inter process shared surface and it can be tricky in the OS on how command queues get flushed and paralized.

thewhiteambit
Adventurer
 jherico: How would the capability of giving the OVR-API an external texture-set, make the noob-option any harder? I am talking about options in the OVR-API for professionals who know what they are doing! This would offer a lot of room for improvements for them, but if you prefer an API targeted exclusively for beginners... that was point of my critics!

And the fact you found an absolute stupid idea of how this could be handled (by not returning until it had copied the texture content to it's own surface), it is no prove there is no better way of handling this! At least two pop up in my mind, that do not "end up being either less powerful and/or efficient"