From e831b5019a22209a71305d858b0857a316e56b46 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Daniel=20Est=C3=A9vez?= <daniel@destevez.net>
Date: Mon, 6 Nov 2023 20:41:31 +0100
Subject: [PATCH] volk_32fc_x2_divide_32fc: add documentation about numerical
 accuracy
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This adds some documentation to the complex divide kernel about the
potential pitfall in some architectures of obtaining inf or nan
with a small denominator (even if the numerator is zero) due to
reduced range because of -fcx-limited-range and -freciprocal-math.

This closes #608.

Signed-off-by: Daniel Estévez <daniel@destevez.net>
---
 kernels/volk/volk_32fc_x2_divide_32fc.h | 37 +++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/kernels/volk/volk_32fc_x2_divide_32fc.h b/kernels/volk/volk_32fc_x2_divide_32fc.h
index f15e702cd..c8ed6d960 100644
--- a/kernels/volk/volk_32fc_x2_divide_32fc.h
+++ b/kernels/volk/volk_32fc_x2_divide_32fc.h
@@ -54,6 +54,43 @@
  *   volk_free(input_vector);
  *   volk_free(out);
  * \endcode
+ *
+ * \b Numerical accuracy
+ *
+ * The division of complex numbers (a + bj)/(c + dj) is calculated as
+ * (a*c + b*d)/(c*c + d*d) + (b*c - a*d)/(c*c + d*d) j.
+ *
+ * Since Volk is built using unsafe math optimizations
+ * (-fcx-limited-range -funsafe-math-optimizations with gcc), after computing
+ * the two numerators and the denominator in the above formula, the compiler
+ * can either:
+ *
+ * a) compute the two divisions;
+ * b) compute the inverse of the denominator, 1.0/(c*c + d*d), and then
+ * multiply this number by each of the numerators.
+ *
+ * Under gcc, b) is allowed by -freciprocal-math, which is included in
+ * -funsafe-math-optimizations.
+ *
+ * Depending on the architecture and the estimated cost of multiply and divide
+ * instructions, the compiler can perform a) or b). gcc performs a) under x86_64
+ * and armv7, and b) under aarch64.
+ *
+ * To avoid obtaining inf or nan in some architectures, care should be taken
+ * that c*c + d*d is not too small. In particular, if c*c + d*d < FLT_MAX, then
+ * the calculation of 1.0/(c*c + d*d) will yield inf.
+ *
+ * For more information about numerical accuracy of complex division, see the
+ * following:
+ *
+ * - https://godbolt.org/z/z9z5b9oM5 Compiler Explorer example showing how
+ *   complex division is done in different architectures and with different
+ *   optimization options.
+ *
+ * - https://github.com/gcc-mirror/gcc/blob/7e3c58bfc1d957e48faf0752758da0fed811ed58/libgcc/libgcc2.c#L2707
+ *   gcc's implementation of the __divsc3 function, which implements C99 complex
+ *   division semantics and handles several potential numerical problems. This
+ *   function is used whenever -fcx-limited-range is not enabled.
  */
 
 #ifndef INCLUDED_volk_32fc_x2_divide_32fc_u_H