Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use multi_draw_indirect_count where available, in preparation for two-phase occlusion culling. #17211

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

pcwalton
Copy link
Contributor

@pcwalton pcwalton commented Jan 7, 2025

This commit allows Bevy to use multi_draw_indirect_count for drawing meshes. The multi_draw_indirect_count feature works just like multi_draw_indirect, but it takes the number of indirect parameters from a GPU buffer rather than specifying it on the CPU.

Currently, the CPU constructs the list of indirect draw parameters with the instance count for each batch set to zero, uploads the resulting buffer to the GPU, and dispatches a compute shader that bumps the instance count for each mesh that survives culling. Unfortunately, this is inefficient when we support multi_draw_indirect_count. Draw commands corresponding to meshes for which all instances were culled will remain present in the list when calling
multi_draw_indirect_count, causing overhead. Proper use of multi_draw_indirect_count requires eliminating these empty draw commands.

To address this inefficiency, this PR makes Bevy fully construct the indirect draw commands on the GPU instead of on the CPU. Instead of writing instance counts to the draw command buffer, the mesh preprocessing shader now writes them to a separate indirect metadata buffer. A second compute dispatch known as the build indirect parameters shader runs after mesh preprocessing and converts the indirect draw metadata into actual indirect draw commands for the GPU. The build indirect parameters shader operates on a batch at a time, rather than an instance at a time, and as such each thread writes only 0 or 1 indirect draw parameters, simplifying the current logic in mesh_preprocessing, which currently has to have special cases for the first mesh in each batch. The build indirect parameters shader emits draw commands in a tightly packed manner, enabling maximally efficient use of multi_draw_indirect_count.

Along the way, this patch switches mesh preprocessing to dispatch one compute invocation per render phase per view, instead of dispatching one compute invocation per view. This is preparation for two-phase occlusion culling, in which we will have two mesh preprocessing stages. In that scenario, the first mesh preprocessing stage must only process opaque and alpha tested objects, so the work items must be separated into those that are opaque or alpha tested and those that aren't. Thus this PR splits out the work items into a separate buffer for each phase. As this patch rewrites so much of the mesh preprocessing infrastructure, it was simpler to just fold the change into this patch instead of deferring it to the forthcoming occlusion culling PR.

Finally, this patch changes mesh preprocessing so that it runs separately for indexed and non-indexed meshes. This is because draw commands for indexed and non-indexed meshes have different sizes and layouts. The existing code is actually broken for non-indexed meshes, as it attempts to overlay the indirect parameters for non-indexed meshes on top of those for indexed meshes. Consequently, right now the parameters will be read incorrectly when multiple non-indexed meshes are multi-drawn together. This is a bug fix and, as with the change to dispatch phases separately noted above, was easiest to include in this patch as opposed to separately.

@pcwalton pcwalton added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Jan 7, 2025
@pcwalton pcwalton added this to the 0.16 milestone Jan 7, 2025
two-phase occlusion culling.

This commit allows Bevy to use `multi_draw_indirect_count` for
drawing meshes. The `multi_draw_indirect_count` feature works just like
`multi_draw_indirect`, but it takes the number of indirect parameters
from a GPU buffer rather than specifying it on the CPU.

Currently, the CPU constructs the list of indirect draw parameters with
the instance count for each batch set to zero, uploads the resulting
buffer to the GPU, and dispatches a compute shader that bumps the
instance count for each mesh that survives culling. Unfortunately, this
is inefficient when we support `multi_draw_indirect_count`. Draw
commands corresponding to meshes for which all instances were culled
will remain present in the list when calling
`multi_draw_indirect_count`, causing overhead. Proper use of
`multi_draw_indirect_count` requires eliminating these empty draw
commands.

To address this inefficiency, this PR makes Bevy fully construct the
indirect draw commands on the GPU instead of on the CPU. Instead of
writing instance counts to the draw command buffer, the mesh
preprocessing shader now writes them to a separate *indirect metadata
buffer*. A second compute dispatch known as the *build indirect
parameters* shader runs after mesh preprocessing and converts the
indirect draw metadata into actual indirect draw commands for the GPU.
The build indirect parameters shader operates on a batch at a time,
rather than an instance at a time, and as such each thread writes only 0
or 1 indirect draw parameters, simplifying the current logic in
`mesh_preprocessing`, which has to have special cases for the first mesh
in each batch. The build indirect parameters shader emits draw commands
in a tightly packed manner, enabling maximally efficient use of
`multi_draw_indirect_count`.

Along the way, this patch switches mesh preprocessing to dispatch one
compute invocation per render phase per view, instead of dispatching one
compute invocation per view. This is preparation for two-phase occlusion
culling, in which we will have two mesh preprocessing stages. In that
scenario, the first mesh preprocessing stage must only process opaque
and alpha tested objects, so the work items must be separated into those
that are opaque or alpha tested and those that aren't. Thus this PR
splits out the work items into a separate buffer for each phase. As this
patch rewrites so much of the mesh preprocessing infrastructure, it was
simpler to just fold the change into this patch instead of deferring it
to the forthcoming occlusion culling PR.

Finally, this patch changes mesh preprocessing so that it runs
separately for indexed and non-indexed meshes. This is because draw
commands for indexed and non-indexed meshes have different sizes and
layouts. *The existing code is actually broken for non-indexed meshes*,
as it attempts to overlay the indirect parameters for non-indexed meshes
on top of those for indexed meshes. Consequently, right now the
parameters will be read incorrectly when multiple non-indexed meshes are
multi-drawn together. *This is a bug fix* and, as with the change to
dispatch phases separately noted above, was easiest to include in this
patch as opposed to separately.
@pcwalton pcwalton force-pushed the multidraw-indirect-count branch from 22c95b3 to bbebd62 Compare January 7, 2025 06:25
@BenjaminBrienen BenjaminBrienen added the D-Complex Quite challenging from either a design or technical perspective. Ask for help! label Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times D-Complex Quite challenging from either a design or technical perspective. Ask for help! S-Needs-Review Needs reviewer attention (from anyone!) to move forward
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

2 participants