cancel
Showing results for 
Search instead for 
Did you mean: 

RenderDoc for Oculus render stage trace with DrawProceduralIndirect() / glDrawArraysIndirect()

WilliamI-H
Level 3

Hi all,

 

I've recently been working on adding indirect draw calls (using Unity version 2020.3.25f1) with our app. I've noticed some unexpected results from render stage traces in RenderDoc for Oculus and I'm wondering if anyone has insight into this.

 

As part of this, I've been switching code from using Unity's Graphics.DrawProcedural() function to Graphics.DrawProceduralIndirect(). (On OpenGL, this corresponds to the glDrawArraysIndirect() function).

 

As a baseline, here's what a render stage trace from our app looks like using direct rendering (DrawProcedural). There are two surfaces, corresponding to the outputs for each eye. This screenshot looks correct to me, and appears logical.

image (1).png

 

The next screenshot below is from indirect drawing. The app logic is the same as for the first screenshot, other than that the draw call parameters are uploaded from the CPU to indirect arguments buffers and then rendered with indirect draw calls using Unity's DrawProceduralIndirect(). Here, the first surface appears to be taking much longer than the second surface, which doesn't make much sense to me.

image (2).png

 

Interestingly, if I use the same RenderDoc capture from indirect rendering as the screenshot above, but click the "Time durations for render stages" button in RenderDoc a second time, I get a different render stage trace. This trace has more equal durations for the two surfaces / eyes, which makes sense. However, surface 0x9 appears twice in the timeline, whereas surface 0x8 only appears once, and I'm not sure why this is. There's also a gap in the middle of the timeline.

WilliamIH_0-1643330177196.png

 

I have a few questions - any help with the following would be much appreciated!

  • It's not obvious to me why direct vs indirect drawing would cause this difference. I think it could be that there is some sort of synchronization issue with the indirect draw call arguments buffer that could cause this. Is this difference expected?
  • Why does clicking the "Time durations for render stages" button twice give such a different result with indirect drawing? If I click the button twice with direct drawing, the new trace is substantially the same.
  • What does the gap in the middle of the last screenshot indicate? Does this indicate the GPU idling / stalling or something else?

Many thanks.

9 REPLIES 9

thisisjimmylee
Level 3

There are indirect vs direct draw performance differences on Vulkan that we are investigating right now. So RenderDoc might actually be correct here. RenderDoc measures the scene by playing back the command streams, so it is normal that we can get two different measurements if some unknown condition changed between the measurements. 
The gap in between the two surfaces could be due to sync points during command playback. One way to check this is to double check is to perform a renderstage trace with Perfetto. A trace using Perfetto will be a capture of the trace while the app is running (as opposed to RenderDoc's replayed measurements).

WilliamI-H
Level 3

Thanks for the insight - that's interesting re the performance differences on Vulkan. Are those differences on OpenGL too? I'd be keen to hear more about what type of performance differences you're observing with direct vs indirect draw calls, if that's something you are able to share?

 

I've taken a trace using Perfetto - appreciate the suggestion.

Screenshot (477).png

This actually appears to match what RenderDoc was showing - the same surface, in this case #7, appears twice, with a gap in the middle. So it looks like this is an actual representation of the app's behaviour, not an artifact from RenderDoc's playback - good to know.

 

I have noticed something else odd since I posted - the CPU time in Unity's render thread appears to be much higher when using indirect draw calls (about 13 ms in one case, vs around 2-3 ms with direct draw calls). I might look into that separately as it might be a Unity specific issue.

 

I would be interested to know if you have any further thoughts on the gap between the surfaces based on the Perfetto trace, and whether it might relate to the performance differences you mentioned you're investigating?

 

Thanks again.

thisisjimmylee
Level 3

I don't have the details on the perf differences, but we should be able to fix it in a future OS update. 

As to the gap, I would recommend you perform a low overhead renderstage trace (disable high precision gpu renderstage tracing in the Oculus Developer Hub's Performance tab). The gap could be simply the profiling overhead of gathering and accumulating all the stats, so I would check that to see if the gap disappears in that case.

WilliamI-H
Level 3

Thanks for that! I've done a low overhead trace. Interestingly the gap still appears in this one, and also the same result with one surface appearing both before and after the gap. 

WilliamIH_0-1643750778400.png

 

You might also want to check the number of drawcalls for the surface that has a gap in it. When the number of drawcalls is huge and generates a large number of primitives, the driver will split the renderpass into two because there will not be enough memory to store the primitives in one or more bins. If this were the case you'll want to combine your assets into fewer drawcalls.

 

WilliamI-H
Level 3

Interesting - that's good to know re splitting render passes when there's too many draw calls.

 

In this case the number of draw calls seems okay though. Just took another RenderDoc capture and was 16 draw calls per eye, so 32 draw calls total in one frame (not using multiview at the moment).

 

I'm intrigued by how the gap appears after changing some direct draw calls to indirect draw calls. Is there anything else that might be the cause of this?

WilliamI-H
Level 3

An update on this - I've done an additional test and am still getting an issue where indirect drawing with OpenGL seems to take far more CPU time than expected.

 

When I add an extra direct draw call (glDrawArrays) into our app, the CPU time taken for the draw call is about 0.1 ms. If I switch it to an indirect draw call (glDrawArraysIndirect), this seems to take about 13 ms of CPU time (or more).

 

This draw call is using custom OpenGL code with a native rendering plugin in Unity, and using the Unity profiler for these measurements.

 

Does anyone know if it's expected that OpenGL indirect draw calls would be so slow (13 ms of CPU time for a single draw call) on Oculus Quest?

The cpu cost of the drawcall indeed is suspiciously long. Do you mind sending me the renderdoc capture so that I can let Qualcomm to take a look at it?

That would be great! How should I send you a capture? (The forum doesn't seem to let me upload .rdc files).

 

I've taken another RenderDoc capture that demonstrates the issue. This capture isn't from our app, but I've created a simple test project using indirect drawing, and this reproduces the issue with an unexpectedly slow draw call.

 

Details:

  • Unity app using version 2020.3.25f1
  • Testing on Oculus Quest 2
  • Using multi-pass rendering (a draw call per eye)
  • This test project just draws a single triangle using indirect drawing, so there's only two draw calls in total (one per eye).

Here's a screenshot from the Unity profiler. You can see that the rendering (draw call) for one eye is much faster than the other (0.16 ms vs 1.46 ms). The first time (0.16 ms) seems reasonable for one draw call, but the second (1.46 ms) seems too slow.

WilliamIH_1-1644957356053.png

 

Here's the render stage trace from RenderDoc. I'm getting the same behavior with a gap between the surfaces, and one surface appearing twice.

WilliamIH_2-1644957773321.png