Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-enable AVX2 convolutional decoder #741

Merged
merged 4 commits into from
Jan 12, 2024
Merged

Conversation

argilo
Copy link
Member

@argilo argilo commented Jan 8, 2024

Reverts #458.

Now that the volk_8u_conv_k7_r2puppet_8u kernel has a working test (fixed in #736), we can safely make changes and be confident that the various protokernels are producing identical output. Here I've re-enabled the broken AVX2 convolutional decoder which was commented out in #458. To get identical output to the other protokernels, I made the following changes (each in a separate commit, for easier review):

  • Re-normalize the branch metrics on every iteration, to avoid integer overflow. (Fix bugs in convolutional decoder #736 did the same for the spiral and neonspiral protokernels, which was necessary to get identical output to the generic protokernel.)
  • Remove an extraneous permutation that was executed at the beginning of each iteration.
  • During re-normalization, compute the minimum branch metric over both AVX2 register lanes.

I tested with many different vector lengths (for v in {0..2048}; do echo $v; apps/volk_profile -n -R k7_r2puppet -v $v -i 1 2>&1 | grep fail; done), and did not observe any test failures.

Performance of the AVX2 protokernel is slightly worse than the spiral protokernel at the default 131071 vector length, but better at shorter vector lengths (e.g. 16384). I suspect that with some further tweaks, AVX2 performance could be improved, but I'll leave that for a future PR. In particular, increasing the metric shift (as was done in #475 but reverted in #736) may reduce the number of expensive re-normalizations that need to be performed. And perhaps the minimum calculation (which is not SIMD-friendly) could be removed from the re-normalization as well.

/cc @Aang23

Copy link
Contributor

@jdemel jdemel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's give this a try.

Thanks for your performance comparisons. I would suspect that most decodings run on shorter frames < 2^15 bits. This would make the AVX2 kernels the implementation of choice.

@jdemel jdemel merged commit 5447d06 into gnuradio:main Jan 12, 2024
33 checks passed
Alesha72003 pushed a commit to Alesha72003/volk that referenced this pull request May 15, 2024
Re-enable AVX2 convolutional decoder
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants