Floating point error propagation in polynomial multiplication with Fast-Fourier Transform

  • Simple question: when using the FFT (or any Fourier transform methods) for multiplying two polynomials, how to deal with floating point error propagation?

In this Jupyter notebook, I will try to understand this phenomena.


Requirements

In [96]:
import numpy as np
np.version.full_version
Out[96]:
'1.12.1'

Examples

Let consider the polynomials $P(X) = 1 + X + X^3$ and $Q(X) = X^2 + X^5$, of degrees $n=3$ and $m=5$:

In [116]:
P = [      1, 0, 1, 1]
n = len(P) - 1
P, n
Out[116]:
([1, 0, 1, 1], 3)
In [117]:
Q = [1, 0, 0, 1, 0, 0]
m = len(Q) - 1
Q, m
Out[117]:
([1, 0, 0, 1, 0, 0], 5)

Their product is $(PQ)(X)$, of degree $n+m=8$: $$ \begin{align*}(PQ)(X) &= (1 + X + X^3) (X^2 + X^5) \\ &= X^2 + X^5 + X^3 + X^7 + X^5 + X^8 \\ &= X^2 + X^3 + 2 X^5 + X^7 + X^8 \end{align*} $$

In [36]:
PQ = polymul(P, Q)
d = len(PQ) - 1
PQ, d
Out[36]:
(array([1, 0, 1, 2, 0, 1, 1, 0, 0]), 8)

If we evaluate both $P(X)$ and $Q(X)$, on $n+m$ different points, $\lambda_i \in \mathbb{R}$ or $\in\mathbb{N}$, then we can fit a polynomial of degree $n+m = \delta(PQ)$ on these sampling points, and by uniqueness (thanks to the Fundamental Theorem of Algebra), it will be equal to $(PQ)(X)$.

The fit can be obtained, for instance, by Lagrange interpolation, which is not so efficient but easy to implement. Here, I will simply use the numpy.polyfit function.

In [39]:
assert d == n + m

Naive interpolation values

Let us consider the points $\lambda_i = i$, $i=0,\dots,n+m$.

In [67]:
lambdas = np.arange(0, d + 1)
lambdas
Out[67]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8])
In [68]:
values_P = np.polyval(P, lambdas)
values_P
Out[68]:
array([  1,   3,  11,  31,  69, 131, 223, 351, 521])
In [69]:
values_Q = np.polyval(Q, lambdas)
values_Q
Out[69]:
array([    0,     2,    36,   252,  1040,  3150,  7812, 16856, 32832])
In [70]:
values_PQ = values_P * values_Q
values_PQ
Out[70]:
array([       0,        6,      396,     7812,    71760,   412650,
        1742076,  5916456, 17105472])
In [71]:
PQ_sampled = np.polyfit(lambdas, values_PQ, d)
PQ_sampled
Out[71]:
array([  1.00000000e+00,  -5.06843863e-12,   1.00000000e+00,
         2.00000000e+00,   2.43215915e-09,   9.99999993e-01,
         1.00000001e+00,  -9.77014560e-09,   5.62310258e-09])
In [72]:
PQ
Out[72]:
array([1, 0, 1, 2, 0, 1, 1, 0, 0])
In [74]:
np.asarray(np.round(PQ_sampled), dtype=int)
Out[74]:
array([1, 0, 1, 2, 0, 1, 1, 0, 0])

Ok, at least it seems to work!

But we saw that even with very small degrees ($n=3, m=5$), floating point errors were not so small on these wrongly chosen points $\lambda_i = i$, $i=0,\dots,n+m$. The largest "should be 0" value (i.e., $\simeq 0$) value was:

In [80]:
np.max(np.abs(PQ_sampled)[np.abs(PQ_sampled) < 0.9])
Out[80]:
9.7701456003317433e-09

Chebyshev nodes as interpolation values

Let us consider the points $\lambda_k = \cos\left(\frac{2k-1}{2d} \pi\right)$, $k=1,\dots,1+d=n+m+1$. These are called the Chebyshev nodes.

In [85]:
lambdas = np.cos(np.pi * (2 * np.arange(1, 2 + d) - 1) / (2 * d))
lambdas
Out[85]:
array([ 0.98078528,  0.83146961,  0.55557023,  0.19509032, -0.19509032,
       -0.55557023, -0.83146961, -0.98078528, -0.98078528])
In [86]:
values_P = np.polyval(P, lambdas)
values_P
Out[86]:
array([ 2.92424164,  2.40629924,  1.72705159,  1.20251551,  0.79748449,
        0.27294841, -0.40629924, -0.92424164, -0.92424164])
In [87]:
values_Q = np.polyval(Q, lambdas)
values_Q
Out[87]:
array([ 1.86948796,  1.08874542,  0.36158742,  0.03834284,  0.03777763,
        0.25572914,  0.29393801,  0.05439157,  0.05439157])
In [88]:
values_PQ = values_P * values_Q
values_PQ
Out[88]:
array([ 5.46683454,  2.61984727,  0.62448014,  0.04610786,  0.03012707,
        0.06980086, -0.11942679, -0.05027096, -0.05027096])
In [91]:
PQ_sampled2 = np.polyfit(lambdas, values_PQ, d)
PQ_sampled2
/usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py:1: RankWarning: Polyfit may be poorly conditioned
  """Entry point for launching an IPython kernel.
Out[91]:
array([  1.21880049e+00,   1.55720018e-14,   5.62399019e-01,
         2.00000000e+00,   2.73500613e-01,   1.00000000e+00,
         9.45299877e-01,  -2.11823147e-16,   1.70937883e-03])
In [92]:
PQ
Out[92]:
array([1, 0, 1, 2, 0, 1, 1, 0, 0])
In [93]:
np.asarray(np.round(PQ_sampled2), dtype=int)
Out[93]:
array([1, 0, 1, 2, 0, 1, 1, 0, 0])

Ok, at least it seems to work!

But we saw that even with very small degrees ($n=3, m=5$), floating point errors were not so small on these points: the largest "should be 0" value (i.e., $\simeq 0$) value was:

In [94]:
np.max(np.abs(PQ_sampled2)[np.abs(PQ_sampled2) < 0.9])
Out[94]:
0.56239901918986446

Benchmark

Stupidly, let us check if our naive implementation of $(P, Q) \mapsto PQ$ by evaluation-and-interpolation is more or less efficient than numpy.polyfit:

In [123]:
def mypolymul(P, Q):
    n = len(P) - 1
    m = len(Q) - 1
    d = n + m
    lambdas = np.cos(np.pi * (2 * np.arange(1, 2 + d) - 1) / (2 * d))
    values_P = np.polyval(P, lambdas)
    values_Q = np.polyval(Q, lambdas)
    values_PQ = values_P * values_Q
    PQ_sampled = np.polyfit(lambdas, values_PQ, d)
    # return PQ_sampled
    return np.asarray(np.round(PQ_sampled), dtype=int)
In [124]:
np.polymul(P, Q)
Out[124]:
array([1, 0, 1, 2, 0, 1, 1, 0, 0])
In [125]:
mypolymul(P, Q)
Out[125]:
array([1, 0, 1, 2, 0, 1, 1, 0, 0])
In [121]:
import warnings
warnings.simplefilter('ignore', np.RankWarning)

%timeit np.polymul(P, Q)
%timeit mypolymul(P, Q)
33 µs ± 995 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
181 µs ± 7.68 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Of course, our implementation is slower. But on small polynomials, not so slower.

What about larger polynomials?

In [126]:
def random_polynomial(d=10, maxcoef=1):
    return np.random.randint(low=-maxcoef, high=maxcoef+1, size=d+1)
In [134]:
P = random_polynomial()
Q = random_polynomial()
P, Q
%timeit np.polymul(P, Q)
np.polymul(P, Q)
%timeit mypolymul(P, Q)
mypolymul(P, Q)
assert np.all(np.polymul(P, Q) == mypolymul(P, Q))
Out[134]:
(array([ 1,  1, -1, -1,  0,  1,  1,  0,  1,  1,  0]),
 array([-1,  0, -1,  0,  0, -1,  1,  0, -1,  1,  1]))
27.4 µs ± 1.67 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Out[134]:
array([-1, -1,  0,  0,  1, -1, -1,  1, -3, -2,  1,  0, -1, -3,  0,  3,  0,
        0,  2,  1,  0])
350 µs ± 7.41 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Out[134]:
array([-1, -1,  0,  0,  1, -1, -1,  1, -3, -2,  1,  0, -1, -3,  0,  3,  0,
        0,  2,  1,  0])

On a larger example:

In [138]:
d = 100
maxcoef = 1
%timeit np.polymul(random_polynomial(d=d, maxcoef=maxcoef), random_polynomial(d=d, maxcoef=maxcoef))
%timeit mypolymul(random_polynomial(d=d, maxcoef=maxcoef), random_polynomial(d=d, maxcoef=maxcoef))

P, Q = random_polynomial(d=d, maxcoef=maxcoef), random_polynomial(d=d, maxcoef=maxcoef)
assert np.all(np.polymul(P, Q) == mypolymul(P, Q))
48.6 µs ± 1.36 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
11.9 ms ± 1.7 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
/usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py:7: DeprecationWarning: elementwise == comparison failed; this will raise an error in the future.
  import sys
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-138-c9d7506c5aaa> in <module>()
      5 
      6 P, Q = random_polynomial(d=d, maxcoef=maxcoef), random_polynomial(d=d, maxcoef=maxcoef)
----> 7 assert np.all(np.polymul(P, Q) == mypolymul(P, Q))

AssertionError: 
In [139]:
d = 10
maxcoef = 3
%timeit np.polymul(random_polynomial(d=d, maxcoef=maxcoef), random_polynomial(d=d, maxcoef=maxcoef))
%timeit mypolymul(random_polynomial(d=d, maxcoef=maxcoef), random_polynomial(d=d, maxcoef=maxcoef))

P, Q = random_polynomial(d=d, maxcoef=maxcoef), random_polynomial(d=d, maxcoef=maxcoef)
assert np.all(np.polymul(P, Q) == mypolymul(P, Q))
38.9 µs ± 2.57 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
369 µs ± 15.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-139-49289ccd0af4> in <module>()
      5 
      6 P, Q = random_polynomial(d=d, maxcoef=maxcoef), random_polynomial(d=d, maxcoef=maxcoef)
----> 7 assert np.all(np.polymul(P, Q) == mypolymul(P, Q))

AssertionError: 
In [140]:
d = 10
maxcoef = 50
%timeit np.polymul(random_polynomial(d=d, maxcoef=maxcoef), random_polynomial(d=d, maxcoef=maxcoef))
%timeit mypolymul(random_polynomial(d=d, maxcoef=maxcoef), random_polynomial(d=d, maxcoef=maxcoef))

P, Q = random_polynomial(d=d, maxcoef=maxcoef), random_polynomial(d=d, maxcoef=maxcoef)
assert np.all(np.polymul(P, Q) == mypolymul(P, Q))
36 µs ± 1.21 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
452 µs ± 62.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-140-f5f8ae306925> in <module>()
      5 
      6 P, Q = random_polynomial(d=d, maxcoef=maxcoef), random_polynomial(d=d, maxcoef=maxcoef)
----> 7 assert np.all(np.polymul(P, Q) == mypolymul(P, Q))

AssertionError: 

Our method is slower. And wrong.

That's sad.


Conclusion

Implementing naively the multiplication of 1D polynomials with evaluation-and-interpolation does not work.

  • It is slower that the FFT based method (available in any numerical computation environment), e.g., numpy.polymul in Python with NumPy.
  • And it does not work. Booum!

Thanks for reading!

See this repo on GitHub for more notebooks, or on nbviewer.jupyter.org.

That's it for this demo! See you, folks!