Make setting a stride of the ImageView data possible #150

st0rmbtw · 2025-01-22T23:24:00Z

I was implementing the chunked lightmap rendering in my project.
The idea is that instead of one huge lightmap texture, I create multiple smaller ones and render them only if they are visible.
But I ran into a problem when I needed to update the lightmap on the intersection of multiple chunks.
Because I had to calculate proper offsets and size for each chunk's texture.

As the size of the texture of a chunk and the size of the lightmap buffer on CPU are different, I had to do the following:

Allocate a buffer with the size of the chunk's texture.
Copy the specific portion of data from the CPU lightmap to the bufer.
Update the GPU texture with the data from the buffer.
Deallocate the buffer.

It was quite slow to do all of that. I could have only the third step if there was an option to set custom stride for the source data of the image view.

But then I noticed that ID3D11DeviceContext::UpdateSubresource function has a parameter just for what I need, so I made this PR to expose that parameter in the public interface.

LukasBanana

I'm glad to see you getting acquainted with the inner workings of LLGL :-) I started with a concept for rowStride and layerStride in this struct earlier of last year but never took the time to start on the implementation because I had no need for this feature so far.

I'm happy to help with the Metal backend, but we can add this a bit later. It would make sense to also add this field to MutableImageView at some point, but that's maybe something for another time.

include/LLGL/ImageFlags.h

include/LLGL-C/LLGLWrapper.h

sources/Renderer/TextureUtils.h

LukasBanana · 2025-01-23T03:32:26Z

sources/Renderer/TextureUtils.cpp

 {
    const FormatAttributes& formatAttribs = GetFormatAttribs(format);
    if (formatAttribs.blockWidth > 0 && formatAttribs.blockHeight > 0)
    {
-        outLayout.rowStride         = (extent.width * formatAttribs.bitSize) / formatAttribs.blockWidth / 8;
+        outLayout.rowStride         = ((srcDataStride > 0 ? srcDataStride : extent.width) * formatAttribs.bitSize) / formatAttribs.blockWidth / 8;


How should LLGL behave when the row stride is smaller than the width of the source image? Should this be undefined behavior or be defined as an invalid argument? Either way, it should be documented in the ImageView struct for this field. If we consider it an invalid argument we should probably also add max(srcDataStride, extent.width) here to clamp it to the largest value or add an assertion LLGL_ASSERT(srcDataStride == 0 || srcDataStride >= extent.width).

I'm not sure why it is an issue. Isn't the extent there describes the size of the destination subresource? If so then it doesn't matter if the row stride is bigger or smaller than extent.width. Correct me if I'm wrong, please.

If the row stride is smaller than the extent, which is used for both source and destination subresource, copy operations will have overlapping ranges of memory they are trying to copy to. This will lead to undefined behavior when used in parallel either with the multi-threaded image conversion functions or GPU operations.
I am just wondering if we should have an assertion here (e.g. stride == 0 || stride >= width) or just clamp it via max(stride, width).

CommandBuffer::CopyBufferFromTexture() for instance has this contract with its rowStride parameter:

If \c rowStride is 0, the source data is considered to be tightly packed for each array layer and the required alignment is managed automatically. If \c rowStride is not 0, it \b must be greater than or equal to the size (in bytes) of each row in the texture region with respect to the texture's format.

I might have forgotten to put any safe guards in the code for this parameter, but it should at least be documented what the expected behavior is for these cases. I think we should consider a stride smaller than the width of a copy operation to be an invalid parameter and therefore put in an assertion. This way, the programmer will be confronted with this issue right away.

I am just wondering if we should have an assertion here (e.g. stride == 0 || stride >= width) or just clamp it via max(stride, width).

I think an assertion would be better (maybe even put it in DbgRenderSystem::WriteTexture?)

sources/Renderer/Vulkan/VKRenderSystem.cpp

sources/Renderer/OpenGL/Texture/GLTexSubImage.cpp

LukasBanana · 2025-01-23T04:00:25Z

sources/Renderer/Vulkan/VKRenderSystem.cpp

@@ -477,7 +477,7 @@ void VKRenderSystem::WriteTexture(Texture& texture, const TextureRegion& texture
        VK_BUFFER_USAGE_TRANSFER_SRC_BIT // <-- TODO: support read/write mapping //GetStagingVkBufferUsageFlags(bufferDesc.cpuAccessFlags)
    );

-    VKDeviceBuffer stagingBuffer = CreateStagingBufferAndInitialize(stagingCreateInfo, imageData, imageDataSize);
+    VKDeviceBuffer stagingBuffer = CreateTextureStagingBufferAndInitialize(stagingCreateInfo, srcImageView, extent);


It looks like this change ignores imageData. This function must not use the input parameter directly in case the image format and data type had to be converted (see intermediateData). That's what all the code above is for. Oh, and the convert functions such as ConvertImageBuffer() will also have to handle the new stride. Yeah, this features goes wide I'm afraid 😬

I fixed the ignoring of imageData, but I'm not sure about ConvertImageBuffer because I can't figure out how to take the stride into account in ConvertImageBufferFormatWorker

I'll see to also update the image conversion functions over the weekend. The work distribution likely needs to be adjusted.

st0rmbtw · 2025-01-23T06:15:27Z

It was a quick and a low effort solution just to get it working, and also I was sleepy 😅. I just wanted to get your feedback on the idea and implementation.

I'm glad to see you getting acquainted with the inner workings of LLGL :-)

Yeah, now I understand it a little better :). I wish I had more knowledge of graphics programming to help you with the bigger problems.

I'm happy to help with the Metal backend

Yes, please help me if you have the time, I'm scared of Objective-C. If you don't have time I can try to do it myself.

LukasBanana · 2025-01-24T00:30:21Z

I'll take another look over the weekend since this is a bit more involved and I'll come up with a unit test for this feature.

LukasBanana

I added a few small change requests. Besides that, I think this is good to go but we will have to skip D3D11 and D3D12 backends until I updated the ConvertImageBuffer() functions. I hope you don't mind me pulling you into a new discussion, though, if some future tests fail because of this implementation 😄

LukasBanana · 2025-01-25T16:18:57Z

sources/Renderer/TextureUtils.cpp

 {
    const FormatAttributes& formatAttribs = GetFormatAttribs(format);
    if (formatAttribs.blockWidth > 0 && formatAttribs.blockHeight > 0)
    {
-        outLayout.rowStride         = (extent.width * formatAttribs.bitSize) / formatAttribs.blockWidth / 8;
+        outLayout.rowStride         = ((srcDataStride > 0 ? srcDataStride : extent.width) * formatAttribs.bitSize) / formatAttribs.blockWidth / 8;


If the row stride is smaller than the extent, which is used for both source and destination subresource, copy operations will have overlapping ranges of memory they are trying to copy to. This will lead to undefined behavior when used in parallel either with the multi-threaded image conversion functions or GPU operations.
I am just wondering if we should have an assertion here (e.g. stride == 0 || stride >= width) or just clamp it via max(stride, width).

CommandBuffer::CopyBufferFromTexture() for instance has this contract with its rowStride parameter:

If \c rowStride is 0, the source data is considered to be tightly packed for each array layer and the required alignment is managed automatically. If \c rowStride is not 0, it \b must be greater than or equal to the size (in bytes) of each row in the texture region with respect to the texture's format.

I might have forgotten to put any safe guards in the code for this parameter, but it should at least be documented what the expected behavior is for these cases. I think we should consider a stride smaller than the width of a copy operation to be an invalid parameter and therefore put in an assertion. This way, the programmer will be confronted with this issue right away.

LukasBanana · 2025-01-25T16:23:37Z

sources/Renderer/Vulkan/VKRenderSystem.cpp

+            VKDeviceMemory* deviceMemory = region->GetParentChunk();
+            if (void* memory = deviceMemory->Map(device_, region->GetOffset(), dataSize))
+            {
+                const char* src = static_cast<const char*>(data);


I guess I learned something new here :-) I was using reinterpret_cast for these types of conversions, but this static_cast seems just fine and is also the better choice (found this on Stackoverflow).

I can make a PR to replace reinterpret_cast with static_cast when converting from void* :)

LukasBanana · 2025-01-25T16:33:10Z

sources/Renderer/Direct3D11/Texture/D3D11Texture.cpp

@@ -278,7 +278,7 @@ HRESULT D3D11Texture::UpdateSubresource(
        dstBox.back   - dstBox.front
    };

-    const SubresourceCPUMappingLayout dataLayout = CalcSubresourceCPUMappingLayout(format, extent, numArrayLayers, imageView.format, imageView.dataType);
+    const SubresourceCPUMappingLayout dataLayout = CalcSubresourceCPUMappingLayout(format, extent, numArrayLayers, imageView.format, imageView.dataType, imageView.rowStride);


This won't work until ConvertImageBuffer() below implements this parameter, too. So I suggest we skip D3D11 and D3D12 for this PR. I'll implement this afterwards and then we can add those backends, too. Please also update the documentation of what backends will be supported initially in the comment for ImageView::rowStride.

Ohh, okay :(

Those changes weren't that big and we can get them in next in a separate CL, but this would be half-implemented only, so I rather have it properly supported there or not (for now).

LukasBanana

Sorry I missed this in my last review, but we need to handle the case for compressed formats, because they are all block-compressions, i.e. the value of "bytes per pixel" makes no sense in that context since they are never sampled for a single pixel. For those functions, we should skip rowStride and use the old function to just copy the whole input data.

LukasBanana · 2025-01-25T17:47:46Z

sources/Renderer/Vulkan/VKRenderSystem.cpp

@@ -477,7 +479,7 @@ void VKRenderSystem::WriteTexture(Texture& texture, const TextureRegion& texture
        VK_BUFFER_USAGE_TRANSFER_SRC_BIT // <-- TODO: support read/write mapping //GetStagingVkBufferUsageFlags(bufferDesc.cpuAccessFlags)
    );

-    VKDeviceBuffer stagingBuffer = CreateStagingBufferAndInitialize(stagingCreateInfo, imageData, imageDataSize);
+    VKDeviceBuffer stagingBuffer = CreateTextureStagingBufferAndInitialize(stagingCreateInfo, extent, imageData, imageDataSize, srcImageView.rowStride, bytesPerPixel);


GetMemoryFootprint() will return 0 if numTexels is smaller than the block size, so bytesPerPixel will be 0 in this code path for block compression formats such as BC1UNorm (also see FormatAttributes::blockWidth).

For now, we should keep the old function when a compressed format is specified, because it doesn't rely on bytesPerPixel. You can use IsCompressedFormat(format) for this.

LukasBanana · 2025-01-25T17:58:01Z

Thank you, I'll submit as soon as the CI builds go through.

Allow setting a stride of the ImageView data

e37b74a

LukasBanana requested changes Jan 23, 2025

View reviewed changes

LukasBanana self-assigned this Jan 23, 2025

LukasBanana added the feature request Requested features and TODO lists label Jan 23, 2025

LukasBanana reviewed Jan 23, 2025

View reviewed changes

st0rmbtw added 9 commits January 23, 2025 09:25

Rename ImageView::stride to ImageView::rowStride

33e3dcc

Update comment

48b298f

Update the language bindings properly

3d7fc3a

Add the std namespace to the uint32_t type

ae61619

Use Allman indentation

1b318b1

Use the BitBlit function

a29bc9e

Use GLStateManager::SetPixelStoreUnpack

e80af83

Don't ignore intermediate image data

81d8bc8

Call SetPixelStoreUnpack only if initialImage is not null

a5441ff

st0rmbtw requested a review from LukasBanana January 23, 2025 08:42

Add the rowStride parameter to the ImageView constructor

804f8a1

LukasBanana requested changes Jan 25, 2025

View reviewed changes

LukasBanana reviewed Jan 25, 2025

View reviewed changes

Remove the implementation for the D3D11 and D3D12 backends for now

308239f

LukasBanana requested changes Jan 25, 2025

View reviewed changes

st0rmbtw added 2 commits January 25, 2025 20:54

Use the CreateStagingBuffer function for compressed formats

fc48545

Oops. Use the CreateStagingBufferAndInitialize function

74ce9f7

LukasBanana merged commit cc9ea8a into LukasBanana:master Jan 25, 2025
46 of 47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make setting a stride of the ImageView data possible #150

Make setting a stride of the ImageView data possible #150

st0rmbtw commented Jan 22, 2025 •

edited

Loading

LukasBanana left a comment

LukasBanana Jan 23, 2025

st0rmbtw Jan 23, 2025 •

edited

Loading

LukasBanana Jan 25, 2025

st0rmbtw Jan 25, 2025

LukasBanana Jan 23, 2025 •

edited

Loading

st0rmbtw Jan 23, 2025 •

edited

Loading

LukasBanana Jan 24, 2025

st0rmbtw commented Jan 23, 2025 •

edited

Loading

LukasBanana commented Jan 24, 2025

LukasBanana left a comment •

edited

Loading

LukasBanana Jan 25, 2025

LukasBanana Jan 25, 2025

st0rmbtw Jan 25, 2025

LukasBanana Jan 25, 2025

st0rmbtw Jan 25, 2025

LukasBanana Jan 25, 2025

LukasBanana left a comment

LukasBanana Jan 25, 2025

st0rmbtw Jan 25, 2025

LukasBanana commented Jan 25, 2025

Make setting a stride of the ImageView data possible #150

Make setting a stride of the ImageView data possible #150

Conversation

st0rmbtw commented Jan 22, 2025 • edited Loading

LukasBanana left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

st0rmbtw Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LukasBanana Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

st0rmbtw Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

st0rmbtw commented Jan 23, 2025 • edited Loading

LukasBanana commented Jan 24, 2025

LukasBanana left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LukasBanana left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LukasBanana commented Jan 25, 2025

st0rmbtw commented Jan 22, 2025 •

edited

Loading

st0rmbtw Jan 23, 2025 •

edited

Loading

LukasBanana Jan 23, 2025 •

edited

Loading

st0rmbtw Jan 23, 2025 •

edited

Loading

st0rmbtw commented Jan 23, 2025 •

edited

Loading

LukasBanana left a comment •

edited

Loading