Re-enable AVX2 convolutional decoder #741

argilo · 2024-01-08T06:06:17Z

Reverts #458.

Now that the volk_8u_conv_k7_r2puppet_8u kernel has a working test (fixed in #736), we can safely make changes and be confident that the various protokernels are producing identical output. Here I've re-enabled the broken AVX2 convolutional decoder which was commented out in #458. To get identical output to the other protokernels, I made the following changes (each in a separate commit, for easier review):

Re-normalize the branch metrics on every iteration, to avoid integer overflow. (Fix bugs in convolutional decoder #736 did the same for the spiral and neonspiral protokernels, which was necessary to get identical output to the generic protokernel.)
Remove an extraneous permutation that was executed at the beginning of each iteration.
During re-normalization, compute the minimum branch metric over both AVX2 register lanes.

I tested with many different vector lengths (for v in {0..2048}; do echo $v; apps/volk_profile -n -R k7_r2puppet -v $v -i 1 2>&1 | grep fail; done), and did not observe any test failures.

Performance of the AVX2 protokernel is slightly worse than the spiral protokernel at the default 131071 vector length, but better at shorter vector lengths (e.g. 16384). I suspect that with some further tweaks, AVX2 performance could be improved, but I'll leave that for a future PR. In particular, increasing the metric shift (as was done in #475 but reverted in #736) may reduce the number of expensive re-normalizations that need to be performed. And perhaps the minimum calculation (which is not SIMD-friendly) could be removed from the re-normalization as well.

/cc @Aang23

Signed-off-by: Clayton Smith <[email protected]>

jdemel

LGTM. Let's give this a try.

Thanks for your performance comparisons. I would suspect that most decodings run on shorter frames < 2^15 bits. This would make the AVX2 kernels the implementation of choice.

Re-enable AVX2 convolutional decoder

argilo added 4 commits January 7, 2024 23:40

Uncomment AVX2 implementation

f38e5e8

Signed-off-by: Clayton Smith <[email protected]>

Renormalize in every iteration on AVX2

2a3e86b

Signed-off-by: Clayton Smith <[email protected]>

Remove extraneous permutations

212578e

Signed-off-by: Clayton Smith <[email protected]>

Compute the minimum over both register lanes

8a6dbe9

Signed-off-by: Clayton Smith <[email protected]>

jdemel approved these changes Jan 12, 2024

View reviewed changes

jdemel merged commit 5447d06 into gnuradio:main Jan 12, 2024
33 checks passed

Alesha72003 pushed a commit to Alesha72003/volk that referenced this pull request May 15, 2024

Merge pull request gnuradio#741 from argilo/k7-r2-avx2

fc749ab

Re-enable AVX2 convolutional decoder

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-enable AVX2 convolutional decoder #741

Re-enable AVX2 convolutional decoder #741

argilo commented Jan 8, 2024 •

edited

Loading

jdemel left a comment

Re-enable AVX2 convolutional decoder #741

Re-enable AVX2 convolutional decoder #741

Conversation

argilo commented Jan 8, 2024 • edited Loading

jdemel left a comment

Choose a reason for hiding this comment

argilo commented Jan 8, 2024 •

edited

Loading