From e831b5019a22209a71305d858b0857a316e56b46 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Daniel=20Est=C3=A9vez?= Date: Mon, 6 Nov 2023 20:41:31 +0100 Subject: [PATCH] volk_32fc_x2_divide_32fc: add documentation about numerical accuracy MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This adds some documentation to the complex divide kernel about the potential pitfall in some architectures of obtaining inf or nan with a small denominator (even if the numerator is zero) due to reduced range because of -fcx-limited-range and -freciprocal-math. This closes #608. Signed-off-by: Daniel Estévez --- kernels/volk/volk_32fc_x2_divide_32fc.h | 37 +++++++++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/kernels/volk/volk_32fc_x2_divide_32fc.h b/kernels/volk/volk_32fc_x2_divide_32fc.h index f15e702cd..c8ed6d960 100644 --- a/kernels/volk/volk_32fc_x2_divide_32fc.h +++ b/kernels/volk/volk_32fc_x2_divide_32fc.h @@ -54,6 +54,43 @@ * volk_free(input_vector); * volk_free(out); * \endcode + * + * \b Numerical accuracy + * + * The division of complex numbers (a + bj)/(c + dj) is calculated as + * (a*c + b*d)/(c*c + d*d) + (b*c - a*d)/(c*c + d*d) j. + * + * Since Volk is built using unsafe math optimizations + * (-fcx-limited-range -funsafe-math-optimizations with gcc), after computing + * the two numerators and the denominator in the above formula, the compiler + * can either: + * + * a) compute the two divisions; + * b) compute the inverse of the denominator, 1.0/(c*c + d*d), and then + * multiply this number by each of the numerators. + * + * Under gcc, b) is allowed by -freciprocal-math, which is included in + * -funsafe-math-optimizations. + * + * Depending on the architecture and the estimated cost of multiply and divide + * instructions, the compiler can perform a) or b). gcc performs a) under x86_64 + * and armv7, and b) under aarch64. + * + * To avoid obtaining inf or nan in some architectures, care should be taken + * that c*c + d*d is not too small. In particular, if c*c + d*d < FLT_MAX, then + * the calculation of 1.0/(c*c + d*d) will yield inf. + * + * For more information about numerical accuracy of complex division, see the + * following: + * + * - https://godbolt.org/z/z9z5b9oM5 Compiler Explorer example showing how + * complex division is done in different architectures and with different + * optimization options. + * + * - https://github.com/gcc-mirror/gcc/blob/7e3c58bfc1d957e48faf0752758da0fed811ed58/libgcc/libgcc2.c#L2707 + * gcc's implementation of the __divsc3 function, which implements C99 complex + * division semantics and handles several potential numerical problems. This + * function is used whenever -fcx-limited-range is not enabled. */ #ifndef INCLUDED_volk_32fc_x2_divide_32fc_u_H