Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make setting a stride of the ImageView data possible #150

Merged
merged 14 commits into from
Jan 25, 2025

Conversation

st0rmbtw
Copy link
Contributor

@st0rmbtw st0rmbtw commented Jan 22, 2025

I was implementing the chunked lightmap rendering in my project.
The idea is that instead of one huge lightmap texture, I create multiple smaller ones and render them only if they are visible.
But I ran into a problem when I needed to update the lightmap on the intersection of multiple chunks.
Because I had to calculate proper offsets and size for each chunk's texture.

As the size of the texture of a chunk and the size of the lightmap buffer on CPU are different, I had to do the following:

  1. Allocate a buffer with the size of the chunk's texture.
  2. Copy the specific portion of data from the CPU lightmap to the bufer.
  3. Update the GPU texture with the data from the buffer.
  4. Deallocate the buffer.

It was quite slow to do all of that. I could have only the third step if there was an option to set custom stride for the source data of the image view.

But then I noticed that ID3D11DeviceContext::UpdateSubresource function has a parameter just for what I need, so I made this PR to expose that parameter in the public interface.

Copy link
Owner

@LukasBanana LukasBanana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad to see you getting acquainted with the inner workings of LLGL :-) I started with a concept for rowStride and layerStride in this struct earlier of last year but never took the time to start on the implementation because I had no need for this feature so far.

I'm happy to help with the Metal backend, but we can add this a bit later. It would make sense to also add this field to MutableImageView at some point, but that's maybe something for another time.

include/LLGL/ImageFlags.h Outdated Show resolved Hide resolved
include/LLGL-C/LLGLWrapper.h Outdated Show resolved Hide resolved
sources/Renderer/TextureUtils.h Outdated Show resolved Hide resolved
{
const FormatAttributes& formatAttribs = GetFormatAttribs(format);
if (formatAttribs.blockWidth > 0 && formatAttribs.blockHeight > 0)
{
outLayout.rowStride = (extent.width * formatAttribs.bitSize) / formatAttribs.blockWidth / 8;
outLayout.rowStride = ((srcDataStride > 0 ? srcDataStride : extent.width) * formatAttribs.bitSize) / formatAttribs.blockWidth / 8;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should LLGL behave when the row stride is smaller than the width of the source image? Should this be undefined behavior or be defined as an invalid argument? Either way, it should be documented in the ImageView struct for this field. If we consider it an invalid argument we should probably also add max(srcDataStride, extent.width) here to clamp it to the largest value or add an assertion LLGL_ASSERT(srcDataStride == 0 || srcDataStride >= extent.width).

Copy link
Contributor Author

@st0rmbtw st0rmbtw Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why it is an issue. Isn't the extent there describes the size of the destination subresource? If so then it doesn't matter if the row stride is bigger or smaller than extent.width. Correct me if I'm wrong, please.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the row stride is smaller than the extent, which is used for both source and destination subresource, copy operations will have overlapping ranges of memory they are trying to copy to. This will lead to undefined behavior when used in parallel either with the multi-threaded image conversion functions or GPU operations.
I am just wondering if we should have an assertion here (e.g. stride == 0 || stride >= width) or just clamp it via max(stride, width).

CommandBuffer::CopyBufferFromTexture() for instance has this contract with its rowStride parameter:

If \c rowStride is 0, the source data is considered to be tightly packed for each array layer and the required alignment is managed automatically.
If \c rowStride is not 0, it \b must be greater than or equal to the size (in bytes) of each row in the texture region with respect to the texture's format.

I might have forgotten to put any safe guards in the code for this parameter, but it should at least be documented what the expected behavior is for these cases. I think we should consider a stride smaller than the width of a copy operation to be an invalid parameter and therefore put in an assertion. This way, the programmer will be confronted with this issue right away.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just wondering if we should have an assertion here (e.g. stride == 0 || stride >= width) or just clamp it via max(stride, width).

I think an assertion would be better (maybe even put it in DbgRenderSystem::WriteTexture?)

sources/Renderer/Vulkan/VKRenderSystem.cpp Outdated Show resolved Hide resolved
sources/Renderer/Vulkan/VKRenderSystem.cpp Outdated Show resolved Hide resolved
sources/Renderer/OpenGL/Texture/GLTexSubImage.cpp Outdated Show resolved Hide resolved
@LukasBanana LukasBanana self-assigned this Jan 23, 2025
@LukasBanana LukasBanana added the feature request Requested features and TODO lists label Jan 23, 2025
@@ -477,7 +477,7 @@ void VKRenderSystem::WriteTexture(Texture& texture, const TextureRegion& texture
VK_BUFFER_USAGE_TRANSFER_SRC_BIT // <-- TODO: support read/write mapping //GetStagingVkBufferUsageFlags(bufferDesc.cpuAccessFlags)
);

VKDeviceBuffer stagingBuffer = CreateStagingBufferAndInitialize(stagingCreateInfo, imageData, imageDataSize);
VKDeviceBuffer stagingBuffer = CreateTextureStagingBufferAndInitialize(stagingCreateInfo, srcImageView, extent);
Copy link
Owner

@LukasBanana LukasBanana Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this change ignores imageData. This function must not use the input parameter directly in case the image format and data type had to be converted (see intermediateData). That's what all the code above is for. Oh, and the convert functions such as ConvertImageBuffer() will also have to handle the new stride. Yeah, this features goes wide I'm afraid 😬

Copy link
Contributor Author

@st0rmbtw st0rmbtw Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the ignoring of imageData, but I'm not sure about ConvertImageBuffer because I can't figure out how to take the stride into account in ConvertImageBufferFormatWorker

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see to also update the image conversion functions over the weekend. The work distribution likely needs to be adjusted.

@st0rmbtw
Copy link
Contributor Author

st0rmbtw commented Jan 23, 2025

It was a quick and a low effort solution just to get it working, and also I was sleepy 😅. I just wanted to get your feedback on the idea and implementation.

I'm glad to see you getting acquainted with the inner workings of LLGL :-)

Yeah, now I understand it a little better :). I wish I had more knowledge of graphics programming to help you with the bigger problems.

I'm happy to help with the Metal backend

Yes, please help me if you have the time, I'm scared of Objective-C. If you don't have time I can try to do it myself.

@st0rmbtw st0rmbtw requested a review from LukasBanana January 23, 2025 08:42
@LukasBanana
Copy link
Owner

I'll take another look over the weekend since this is a bit more involved and I'll come up with a unit test for this feature.

Copy link
Owner

@LukasBanana LukasBanana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few small change requests. Besides that, I think this is good to go but we will have to skip D3D11 and D3D12 backends until I updated the ConvertImageBuffer() functions. I hope you don't mind me pulling you into a new discussion, though, if some future tests fail because of this implementation 😄

{
const FormatAttributes& formatAttribs = GetFormatAttribs(format);
if (formatAttribs.blockWidth > 0 && formatAttribs.blockHeight > 0)
{
outLayout.rowStride = (extent.width * formatAttribs.bitSize) / formatAttribs.blockWidth / 8;
outLayout.rowStride = ((srcDataStride > 0 ? srcDataStride : extent.width) * formatAttribs.bitSize) / formatAttribs.blockWidth / 8;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the row stride is smaller than the extent, which is used for both source and destination subresource, copy operations will have overlapping ranges of memory they are trying to copy to. This will lead to undefined behavior when used in parallel either with the multi-threaded image conversion functions or GPU operations.
I am just wondering if we should have an assertion here (e.g. stride == 0 || stride >= width) or just clamp it via max(stride, width).

CommandBuffer::CopyBufferFromTexture() for instance has this contract with its rowStride parameter:

If \c rowStride is 0, the source data is considered to be tightly packed for each array layer and the required alignment is managed automatically.
If \c rowStride is not 0, it \b must be greater than or equal to the size (in bytes) of each row in the texture region with respect to the texture's format.

I might have forgotten to put any safe guards in the code for this parameter, but it should at least be documented what the expected behavior is for these cases. I think we should consider a stride smaller than the width of a copy operation to be an invalid parameter and therefore put in an assertion. This way, the programmer will be confronted with this issue right away.

VKDeviceMemory* deviceMemory = region->GetParentChunk();
if (void* memory = deviceMemory->Map(device_, region->GetOffset(), dataSize))
{
const char* src = static_cast<const char*>(data);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I learned something new here :-) I was using reinterpret_cast for these types of conversions, but this static_cast seems just fine and is also the better choice (found this on Stackoverflow).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make a PR to replace reinterpret_cast with static_cast when converting from void* :)

@@ -278,7 +278,7 @@ HRESULT D3D11Texture::UpdateSubresource(
dstBox.back - dstBox.front
};

const SubresourceCPUMappingLayout dataLayout = CalcSubresourceCPUMappingLayout(format, extent, numArrayLayers, imageView.format, imageView.dataType);
const SubresourceCPUMappingLayout dataLayout = CalcSubresourceCPUMappingLayout(format, extent, numArrayLayers, imageView.format, imageView.dataType, imageView.rowStride);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work until ConvertImageBuffer() below implements this parameter, too. So I suggest we skip D3D11 and D3D12 for this PR. I'll implement this afterwards and then we can add those backends, too. Please also update the documentation of what backends will be supported initially in the comment for ImageView::rowStride.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh, okay :(

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those changes weren't that big and we can get them in next in a separate CL, but this would be half-implemented only, so I rather have it properly supported there or not (for now).

Copy link
Owner

@LukasBanana LukasBanana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed this in my last review, but we need to handle the case for compressed formats, because they are all block-compressions, i.e. the value of "bytes per pixel" makes no sense in that context since they are never sampled for a single pixel. For those functions, we should skip rowStride and use the old function to just copy the whole input data.

@@ -477,7 +479,7 @@ void VKRenderSystem::WriteTexture(Texture& texture, const TextureRegion& texture
VK_BUFFER_USAGE_TRANSFER_SRC_BIT // <-- TODO: support read/write mapping //GetStagingVkBufferUsageFlags(bufferDesc.cpuAccessFlags)
);

VKDeviceBuffer stagingBuffer = CreateStagingBufferAndInitialize(stagingCreateInfo, imageData, imageDataSize);
VKDeviceBuffer stagingBuffer = CreateTextureStagingBufferAndInitialize(stagingCreateInfo, extent, imageData, imageDataSize, srcImageView.rowStride, bytesPerPixel);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetMemoryFootprint() will return 0 if numTexels is smaller than the block size, so bytesPerPixel will be 0 in this code path for block compression formats such as BC1UNorm (also see FormatAttributes::blockWidth).

For now, we should keep the old function when a compressed format is specified, because it doesn't rely on bytesPerPixel. You can use IsCompressedFormat(format) for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, okay

@LukasBanana
Copy link
Owner

Thank you, I'll submit as soon as the CI builds go through.

@LukasBanana LukasBanana merged commit cc9ea8a into LukasBanana:master Jan 25, 2025
46 of 47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requested features and TODO lists
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants