-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make setting a stride of the ImageView data possible #150
Make setting a stride of the ImageView data possible #150
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad to see you getting acquainted with the inner workings of LLGL :-) I started with a concept for rowStride
and layerStride
in this struct earlier of last year but never took the time to start on the implementation because I had no need for this feature so far.
I'm happy to help with the Metal backend, but we can add this a bit later. It would make sense to also add this field to MutableImageView
at some point, but that's maybe something for another time.
sources/Renderer/TextureUtils.cpp
Outdated
{ | ||
const FormatAttributes& formatAttribs = GetFormatAttribs(format); | ||
if (formatAttribs.blockWidth > 0 && formatAttribs.blockHeight > 0) | ||
{ | ||
outLayout.rowStride = (extent.width * formatAttribs.bitSize) / formatAttribs.blockWidth / 8; | ||
outLayout.rowStride = ((srcDataStride > 0 ? srcDataStride : extent.width) * formatAttribs.bitSize) / formatAttribs.blockWidth / 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should LLGL behave when the row stride is smaller than the width of the source image? Should this be undefined behavior or be defined as an invalid argument? Either way, it should be documented in the ImageView
struct for this field. If we consider it an invalid argument we should probably also add max(srcDataStride, extent.width)
here to clamp it to the largest value or add an assertion LLGL_ASSERT(srcDataStride == 0 || srcDataStride >= extent.width)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why it is an issue. Isn't the extent there describes the size of the destination subresource? If so then it doesn't matter if the row stride is bigger or smaller than extent.width
. Correct me if I'm wrong, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the row stride is smaller than the extent, which is used for both source and destination subresource, copy operations will have overlapping ranges of memory they are trying to copy to. This will lead to undefined behavior when used in parallel either with the multi-threaded image conversion functions or GPU operations.
I am just wondering if we should have an assertion here (e.g. stride == 0 || stride >= width
) or just clamp it via max(stride, width)
.
CommandBuffer::CopyBufferFromTexture()
for instance has this contract with its rowStride
parameter:
If \c rowStride is 0, the source data is considered to be tightly packed for each array layer and the required alignment is managed automatically.
If \c rowStride is not 0, it \b must be greater than or equal to the size (in bytes) of each row in the texture region with respect to the texture's format.
I might have forgotten to put any safe guards in the code for this parameter, but it should at least be documented what the expected behavior is for these cases. I think we should consider a stride smaller than the width of a copy operation to be an invalid parameter and therefore put in an assertion. This way, the programmer will be confronted with this issue right away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just wondering if we should have an assertion here (e.g.
stride == 0 || stride >= width
) or just clamp it viamax(stride, width)
.
I think an assertion would be better (maybe even put it in DbgRenderSystem::WriteTexture
?)
@@ -477,7 +477,7 @@ void VKRenderSystem::WriteTexture(Texture& texture, const TextureRegion& texture | |||
VK_BUFFER_USAGE_TRANSFER_SRC_BIT // <-- TODO: support read/write mapping //GetStagingVkBufferUsageFlags(bufferDesc.cpuAccessFlags) | |||
); | |||
|
|||
VKDeviceBuffer stagingBuffer = CreateStagingBufferAndInitialize(stagingCreateInfo, imageData, imageDataSize); | |||
VKDeviceBuffer stagingBuffer = CreateTextureStagingBufferAndInitialize(stagingCreateInfo, srcImageView, extent); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this change ignores imageData
. This function must not use the input parameter directly in case the image format and data type had to be converted (see intermediateData
). That's what all the code above is for. Oh, and the convert functions such as ConvertImageBuffer()
will also have to handle the new stride. Yeah, this features goes wide I'm afraid 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed the ignoring of imageData
, but I'm not sure about ConvertImageBuffer
because I can't figure out how to take the stride into account in ConvertImageBufferFormatWorker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see to also update the image conversion functions over the weekend. The work distribution likely needs to be adjusted.
It was a quick and a low effort solution just to get it working, and also I was sleepy 😅. I just wanted to get your feedback on the idea and implementation.
Yeah, now I understand it a little better :). I wish I had more knowledge of graphics programming to help you with the bigger problems.
Yes, please help me if you have the time, I'm scared of Objective-C. If you don't have time I can try to do it myself. |
I'll take another look over the weekend since this is a bit more involved and I'll come up with a unit test for this feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few small change requests. Besides that, I think this is good to go but we will have to skip D3D11 and D3D12 backends until I updated the ConvertImageBuffer()
functions. I hope you don't mind me pulling you into a new discussion, though, if some future tests fail because of this implementation 😄
sources/Renderer/TextureUtils.cpp
Outdated
{ | ||
const FormatAttributes& formatAttribs = GetFormatAttribs(format); | ||
if (formatAttribs.blockWidth > 0 && formatAttribs.blockHeight > 0) | ||
{ | ||
outLayout.rowStride = (extent.width * formatAttribs.bitSize) / formatAttribs.blockWidth / 8; | ||
outLayout.rowStride = ((srcDataStride > 0 ? srcDataStride : extent.width) * formatAttribs.bitSize) / formatAttribs.blockWidth / 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the row stride is smaller than the extent, which is used for both source and destination subresource, copy operations will have overlapping ranges of memory they are trying to copy to. This will lead to undefined behavior when used in parallel either with the multi-threaded image conversion functions or GPU operations.
I am just wondering if we should have an assertion here (e.g. stride == 0 || stride >= width
) or just clamp it via max(stride, width)
.
CommandBuffer::CopyBufferFromTexture()
for instance has this contract with its rowStride
parameter:
If \c rowStride is 0, the source data is considered to be tightly packed for each array layer and the required alignment is managed automatically.
If \c rowStride is not 0, it \b must be greater than or equal to the size (in bytes) of each row in the texture region with respect to the texture's format.
I might have forgotten to put any safe guards in the code for this parameter, but it should at least be documented what the expected behavior is for these cases. I think we should consider a stride smaller than the width of a copy operation to be an invalid parameter and therefore put in an assertion. This way, the programmer will be confronted with this issue right away.
VKDeviceMemory* deviceMemory = region->GetParentChunk(); | ||
if (void* memory = deviceMemory->Map(device_, region->GetOffset(), dataSize)) | ||
{ | ||
const char* src = static_cast<const char*>(data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I learned something new here :-) I was using reinterpret_cast
for these types of conversions, but this static_cast
seems just fine and is also the better choice (found this on Stackoverflow).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can make a PR to replace reinterpret_cast
with static_cast
when converting from void*
:)
@@ -278,7 +278,7 @@ HRESULT D3D11Texture::UpdateSubresource( | |||
dstBox.back - dstBox.front | |||
}; | |||
|
|||
const SubresourceCPUMappingLayout dataLayout = CalcSubresourceCPUMappingLayout(format, extent, numArrayLayers, imageView.format, imageView.dataType); | |||
const SubresourceCPUMappingLayout dataLayout = CalcSubresourceCPUMappingLayout(format, extent, numArrayLayers, imageView.format, imageView.dataType, imageView.rowStride); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work until ConvertImageBuffer()
below implements this parameter, too. So I suggest we skip D3D11 and D3D12 for this PR. I'll implement this afterwards and then we can add those backends, too. Please also update the documentation of what backends will be supported initially in the comment for ImageView::rowStride
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh, okay :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those changes weren't that big and we can get them in next in a separate CL, but this would be half-implemented only, so I rather have it properly supported there or not (for now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed this in my last review, but we need to handle the case for compressed formats, because they are all block-compressions, i.e. the value of "bytes per pixel" makes no sense in that context since they are never sampled for a single pixel. For those functions, we should skip rowStride
and use the old function to just copy the whole input data.
@@ -477,7 +479,7 @@ void VKRenderSystem::WriteTexture(Texture& texture, const TextureRegion& texture | |||
VK_BUFFER_USAGE_TRANSFER_SRC_BIT // <-- TODO: support read/write mapping //GetStagingVkBufferUsageFlags(bufferDesc.cpuAccessFlags) | |||
); | |||
|
|||
VKDeviceBuffer stagingBuffer = CreateStagingBufferAndInitialize(stagingCreateInfo, imageData, imageDataSize); | |||
VKDeviceBuffer stagingBuffer = CreateTextureStagingBufferAndInitialize(stagingCreateInfo, extent, imageData, imageDataSize, srcImageView.rowStride, bytesPerPixel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetMemoryFootprint()
will return 0 if numTexels
is smaller than the block size, so bytesPerPixel
will be 0 in this code path for block compression formats such as BC1UNorm
(also see FormatAttributes::blockWidth
).
For now, we should keep the old function when a compressed format is specified, because it doesn't rely on bytesPerPixel
. You can use IsCompressedFormat(format)
for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, okay
Thank you, I'll submit as soon as the CI builds go through. |
I was implementing the chunked lightmap rendering in my project.
The idea is that instead of one huge lightmap texture, I create multiple smaller ones and render them only if they are visible.
But I ran into a problem when I needed to update the lightmap on the intersection of multiple chunks.
Because I had to calculate proper offsets and size for each chunk's texture.
As the size of the texture of a chunk and the size of the lightmap buffer on CPU are different, I had to do the following:
It was quite slow to do all of that. I could have only the third step if there was an option to set custom stride for the source data of the image view.
But then I noticed that ID3D11DeviceContext::UpdateSubresource function has a parameter just for what I need, so I made this PR to expose that parameter in the public interface.