I've recently been working on adding indirect draw calls (using Unity version 2020.3.25f1) with our app. I've noticed some unexpected results from render stage traces in RenderDoc for Oculus and I'm wondering if anyone has insight into this.
As part of this, I've been switching code from using Unity's Graphics.DrawProcedural() function to Graphics.DrawProceduralIndirect(). (On OpenGL, this corresponds to the glDrawArraysIndirect() function).
As a baseline, here's what a render stage trace from our app looks like using direct rendering (DrawProcedural). There are two surfaces, corresponding to the outputs for each eye. This screenshot looks correct to me, and appears logical.
The next screenshot below is from indirect drawing. The app logic is the same as for the first screenshot, other than that the draw call parameters are uploaded from the CPU to indirect arguments buffers and then rendered with indirect draw calls using Unity's DrawProceduralIndirect(). Here, the first surface appears to be taking much longer than the second surface, which doesn't make much sense to me.
Interestingly, if I use the same RenderDoc capture from indirect rendering as the screenshot above, but click the "Time durations for render stages" button in RenderDoc a second time, I get a different render stage trace. This trace has more equal durations for the two surfaces / eyes, which makes sense. However, surface 0x9 appears twice in the timeline, whereas surface 0x8 only appears once, and I'm not sure why this is. There's also a gap in the middle of the timeline.
I have a few questions - any help with the following would be much appreciated!
There are indirect vs direct draw performance differences on Vulkan that we are investigating right now. So RenderDoc might actually be correct here. RenderDoc measures the scene by playing back the command streams, so it is normal that we can get two different measurements if some unknown condition changed between the measurements.
The gap in between the two surfaces could be due to sync points during command playback. One way to check this is to double check is to perform a renderstage trace with Perfetto. A trace using Perfetto will be a capture of the trace while the app is running (as opposed to RenderDoc's replayed measurements).
Thanks for the insight - that's interesting re the performance differences on Vulkan. Are those differences on OpenGL too? I'd be keen to hear more about what type of performance differences you're observing with direct vs indirect draw calls, if that's something you are able to share?
I've taken a trace using Perfetto - appreciate the suggestion.
This actually appears to match what RenderDoc was showing - the same surface, in this case #7, appears twice, with a gap in the middle. So it looks like this is an actual representation of the app's behaviour, not an artifact from RenderDoc's playback - good to know.
I have noticed something else odd since I posted - the CPU time in Unity's render thread appears to be much higher when using indirect draw calls (about 13 ms in one case, vs around 2-3 ms with direct draw calls). I might look into that separately as it might be a Unity specific issue.
I would be interested to know if you have any further thoughts on the gap between the surfaces based on the Perfetto trace, and whether it might relate to the performance differences you mentioned you're investigating?
I don't have the details on the perf differences, but we should be able to fix it in a future OS update.
As to the gap, I would recommend you perform a low overhead renderstage trace (disable high precision gpu renderstage tracing in the Oculus Developer Hub's Performance tab). The gap could be simply the profiling overhead of gathering and accumulating all the stats, so I would check that to see if the gap disappears in that case.
You might also want to check the number of drawcalls for the surface that has a gap in it. When the number of drawcalls is huge and generates a large number of primitives, the driver will split the renderpass into two because there will not be enough memory to store the primitives in one or more bins. If this were the case you'll want to combine your assets into fewer drawcalls.
Interesting - that's good to know re splitting render passes when there's too many draw calls.
In this case the number of draw calls seems okay though. Just took another RenderDoc capture and was 16 draw calls per eye, so 32 draw calls total in one frame (not using multiview at the moment).
I'm intrigued by how the gap appears after changing some direct draw calls to indirect draw calls. Is there anything else that might be the cause of this?
An update on this - I've done an additional test and am still getting an issue where indirect drawing with OpenGL seems to take far more CPU time than expected.
When I add an extra direct draw call (glDrawArrays) into our app, the CPU time taken for the draw call is about 0.1 ms. If I switch it to an indirect draw call (glDrawArraysIndirect), this seems to take about 13 ms of CPU time (or more).
This draw call is using custom OpenGL code with a native rendering plugin in Unity, and using the Unity profiler for these measurements.
Does anyone know if it's expected that OpenGL indirect draw calls would be so slow (13 ms of CPU time for a single draw call) on Oculus Quest?
That would be great! How should I send you a capture? (The forum doesn't seem to let me upload .rdc files).
I've taken another RenderDoc capture that demonstrates the issue. This capture isn't from our app, but I've created a simple test project using indirect drawing, and this reproduces the issue with an unexpectedly slow draw call.
Here's a screenshot from the Unity profiler. You can see that the rendering (draw call) for one eye is much faster than the other (0.16 ms vs 1.46 ms). The first time (0.16 ms) seems reasonable for one draw call, but the second (1.46 ms) seems too slow.
Here's the render stage trace from RenderDoc. I'm getting the same behavior with a gap between the surfaces, and one surface appearing twice.