The Bernoulli distribution of mean $p\in[0,1]$ is defined as the distribution on $\{0,1\}$ such that $\mathbb{P}(X=1) = p$ and $\mathbb{P}(X=0) = 1-p$.
If $X$ follows a Binomial distribution of mean $p\in[0,1]$ and $n$ samples, $X$ is defined as the sum of $n$ independent and identically distributed (iid) samples from a Bernoulli distribution of mean $p$, that is $X\in\{0,\dots,n\}$ ($X\in\mathbb{N}$) and $\forall k\in\{0,\dots,n\}, \mathbb{P}(X=k) = {n \choose k} p^k (1-p)^{n-k}$.
Let's import the modules required for this notebook.
import numpy as np
import matplotlib.pyplot as plt
%load_ext cython
%load_ext watermark
%watermark -a "Lilian Besson (Naereen)" -i -v -p numpy,matplotlib,cython
Using the pseudo-random generator of (float) random numbers in $[0,1]$ from the random or numpy.random module, we can easily generate a sample from a Bernoulli distribution.
import random
def uniform_01() -> float:
return random.random()
[ uniform_01() for _ in range(5) ]
It's very quick now:
def bernoulli(p: float) -> int:
return 1 if uniform_01() <= p else 0
[ bernoulli(0) for _ in range(5) ]
[ bernoulli(0.12345) for _ in range(5) ]
[ bernoulli(1) for _ in range(5) ]
So we can naively generate samples from a Binomial distribution by summing iid samples generated using this bernoulli function.
def naive_binomial(n: int, p: float) -> int:
result = 0
for k in range(n): # sum of n iid samples from Bernoulli(p)
result += bernoulli(p)
return result
For example :
[ naive_binomial(10, 0.1) for _ in range(5) ]
[ naive_binomial(10, 0.5) for _ in range(5) ]
[ naive_binomial(10, 0.9) for _ in range(5) ]
We can quickly illustrate the generated distribution, to check it has the correct "shape":
m = 1000
n = 10
p = 0.12345
X = [ naive_binomial(n, p) for _ in range(m) ]
plt.figure()
plt.hist(X)
plt.title(f"{m} samples from a Binomial distribution with n = {n} and p = {p}.")
plt.show()
m = 1000
n = 10
p = 0.5
X = [ naive_binomial(n, p) for _ in range(m) ]
plt.figure()
plt.hist(X)
plt.title(f"{m} samples from a Binomial distribution with n = {n} and p = {p}.")
plt.show()
m = 1000
n = 10
p = 0.98765
X = [ naive_binomial(n, p) for _ in range(m) ]
plt.figure()
plt.hist(X)
plt.title(f"{m} samples from a Binomial distribution with n = {n} and p = {p}.")
plt.show()
numpy.random¶def numpy_binomial(n: int, p: float) -> int:
return np.random.binomial(n, p)
Let's try this out:
[ numpy_binomial(10, 0.1) for _ in range(5) ]
[ numpy_binomial(10, 0.5) for _ in range(5) ]
[ numpy_binomial(10, 0.9) for _ in range(5) ]
Let's plot this out also.
m = 1000
n = 10
p = 0.12345
X = [ numpy_binomial(n, p) for _ in range(m) ]
plt.figure()
plt.hist(X)
plt.title(f"{m} samples from a Binomial distribution with n = {n} and p = {p}.")
plt.show()
m = 1000
n = 10
p = 0.5
X = [ naive_binomial(n, p) for _ in range(m) ]
plt.figure()
plt.hist(X)
plt.title(f"{m} samples from a Binomial distribution with n = {n} and p = {p}.")
plt.show()
m = 1000
n = 10
p = 0.98765
X = [ naive_binomial(n, p) for _ in range(m) ]
plt.figure()
plt.hist(X)
plt.title(f"{m} samples from a Binomial distribution with n = {n} and p = {p}.")
plt.show()
def binomial_coefficient(n: int, k: int) -> int:
"""From https://en.wikipedia.org/wiki/Binomial_coefficient#Binomial_coefficient_in_programming_languages"""
if k < 0 or k > n:
return 0
if k == 0 or k == n:
return 1
k = min(k, n - k) # take advantage of symmetry
c = 1
for i in range(k):
c = (c * (n - i)) / (i + 1)
return c
def proba_binomial(n: int, p: float, k: int) -> float:
"""Compute {n \choose k} p^k (1-p)^(n-k)"""
q = 1.0 - p
return binomial_coefficient(n, k) * p**k * q**(n-k)
This first function is a generic implementation of the discrete inverse transform method. For more details, see the Wikipedia page.
Inverse transformation sampling takes uniform samples of a number $u$ between $0$ and $1$, interpreted as a probability, and then returns the largest number $x$ from the domain of the distribution $\mathbb{P}(X)$ such that $\mathbb{P}(-\infty <X<x)\leq u$.
# a generic function
from typing import Callable
def inversion_method(compute_proba: Callable[[int], int], xmax: int, xmin: int =0) -> int:
probas = [ compute_proba(x) for x in range(xmin, xmax + 1) ]
result = xmin
current_proba = 0
one_uniform_sample = uniform_01()
while current_proba <= one_uniform_sample:
current_proba += probas[result]
result += 1
return result - 1
def first_inversion_binomial(n: int, p: float) -> int:
def compute_proba(x):
return proba_binomial(n, p, x)
xmax = n
xmin = 0
return inversion_method(compute_proba, xmax, xmin=xmin)
Let's try out.
[ first_inversion_binomial(10, 0.1) for _ in range(5) ]
[ first_inversion_binomial(10, 0.5) for _ in range(5) ]
[ first_inversion_binomial(10, 0.9) for _ in range(5) ]
It seems to work as wanted!
The previous function as a few weaknesses: it stores the $n+1$ values of $\mathbb{P}(X=k)$ before hand, it computes all of them even if the for loop of the inversion method stops in average before the end (in average, it takes $np$ steps, which can be much smaller than $n$ for small $p$).
Furthermore, the computations of both the binomial coefficients and the values $p^k (1-p)^{n-k}$ is using powers and not iterative multiplications, leading to more rounding errors.
We can solve all these issues by inlining all the computations.
def inversion_binomial(n: int, p: float) -> int:
if p <= 1e-10:
return 0
if p >= 1 - 1e-10:
return n
if p > 0.5: # speed up by computing for q and then substracting
return n - inversion_binomial(n, 1.0 - p)
result = 0
q = 1.0 - p
current_proba = q**n
cum_proba = current_proba
one_uniform_sample = uniform_01()
while cum_proba <= one_uniform_sample:
current_proba *= (p * (n - result)) / (q * (result + 1))
cum_proba += current_proba
result += 1
return result
Let's try out.
[ inversion_binomial(10, 0.1) for _ in range(5) ]
[ inversion_binomial(10, 0.5) for _ in range(5) ]
[ inversion_binomial(10, 0.9) for _ in range(5) ]
It seems to work as wanted!
And now the storage is indeed $O(1)$, and the computation time is $O(x)$ if the return value is $x$, so the mean computation time is $O(np)$.
Note that if $p=1/2$, then $O(np) = O(n/2) = O(n)$, and thus this improved method using the inversion method is (asymptotically) as costly as the naive method (the first method which consists of summing $n$ iid samples from a Bernoulli of mean $p$).
Let's plot this out also.
m = 1000
n = 10
p = 0.12345
X = [ inversion_binomial(n, p) for _ in range(m) ]
plt.figure()
plt.hist(X)
plt.title(f"{m} samples from a Binomial distribution with n = {n} and p = {p}.")
plt.show()
m = 1000
n = 10
p = 0.5
X = [ inversion_binomial(n, p) for _ in range(m) ]
plt.figure()
plt.hist(X)
plt.title(f"{m} samples from a Binomial distribution with n = {n} and p = {p}.")
plt.show()
m = 1000
n = 10
p = 0.98765
X = [ inversion_binomial(n, p) for _ in range(m) ]
plt.figure()
plt.hist(X)
plt.title(f"{m} samples from a Binomial distribution with n = {n} and p = {p}.")
plt.show()
%load_ext cython
%%cython --annotate
import random
def cython_inversion_binomial(int n, double p) -> int:
if p <= 1e-9:
return 0
if p >= 1 - 1e-9:
return n
if p > 0.5: # speed up by computing for q and then substracting
return n - cython_inversion_binomial(n, 1.0 - p)
cdef int result = 0
cdef double q = 1.0 - p
cdef double current_proba = q**n
cdef double cum_proba = current_proba
cdef double one_uniform_sample = random.random()
while cum_proba < one_uniform_sample:
current_proba *= (p * (n - result)) / (q * (result + 1))
cum_proba += current_proba
result += 1
return result
Let's try out.
[ cython_inversion_binomial(10, 0.1) for _ in range(5) ]
[ cython_inversion_binomial(10, 0.5) for _ in range(5) ]
[ cython_inversion_binomial(10, 0.9) for _ in range(5) ]
It seems to work as wanted!
Let's plot this out also.
m = 1000
n = 10
p = 0.12345
X = [ cython_inversion_binomial(n, p) for _ in range(m) ]
plt.figure()
plt.hist(X)
plt.title(f"{m} samples from a Binomial distribution with n = {n} and p = {p}.")
plt.show()
m = 1000
n = 10
p = 0.5
X = [ cython_inversion_binomial(n, p) for _ in range(m) ]
plt.figure()
plt.hist(X)
plt.title(f"{m} samples from a Binomial distribution with n = {n} and p = {p}.")
plt.show()
inversion_binomialm = 1000
n = 10
p = 0.98765
X = [ cython_inversion_binomial(n, p) for _ in range(m) ]
plt.figure()
plt.hist(X)
plt.title(f"{m} samples from a Binomial distribution with n = {n} and p = {p}.")
plt.show()
n = 100
naive_binomial
first_inversion_binomial
inversion_binomial
cython_inversion_binomial
numpy_binomial
We can use the %timeit magic to check the (mean) computation time of all the previously mentioned functions:
%timeit naive_binomial(n, 0.123456)
%timeit first_inversion_binomial(n, 0.123456)
%timeit inversion_binomial(n, 0.123456)
%timeit cython_inversion_binomial(n, 0.123456)
%timeit numpy_binomial(n, 0.123456)
Apparently, our cython method is faster than the function from numpy!
We also check that our first naive implementation of the inversion method was suboptimal, as announced, because of its pre computation of all the values of $\mathbb{P}(X=k)$. However, we check that the naive method, using the sum of $n$ binomial samples, is as comparably efficient to the pure-Python inversion-based method (for this small $n=100$).
%timeit naive_binomial(n, 0.5)
%timeit first_inversion_binomial(n, 0.5)
%timeit inversion_binomial(n, 0.5)
%timeit cython_inversion_binomial(n, 0.5)
%timeit numpy_binomial(n, 0.5)
%timeit naive_binomial(n, 0.987654)
%timeit first_inversion_binomial(n, 0.987654)
%timeit inversion_binomial(n, 0.987654)
%timeit cython_inversion_binomial(n, 0.987654)
%timeit numpy_binomial(n, 0.987654)
It's quite awesome to see that our inversion-based method is more efficient that the numpy function, both in the pure-Python and the Cython versions! But it's weird, as the numpy function is... based on the inversion method, and itself written in C!
See the source code, numpy/distributions.c line 426 (on the 28th February 2019, commit 7c41164).
But the trick is that the implementation in numpy uses the inversion method (running in $\Omega(np)$) if $pn < 30$, and a method denoted "BTPE" otherwise. I need to work on this method! The BTPE algorithm is much more complicated, and it is described in the following paper:
Kachitvichyanukul, V.; Schmeiser, B. W. (1988). "Binomial random variate generation". Communications of the ACM. 31 (2): 216–222. doi:10.1145/42372.42381.
See the source code, numpy/distributions.c line 263 (on the 28th February 2019, commit 7c41164).
n = 100
%timeit naive_binomial(n, random.random())
%timeit inversion_binomial(n, random.random())
%timeit cython_inversion_binomial(n, random.random())
%timeit numpy_binomial(n, random.random())
n = 1000
%timeit naive_binomial(n, random.random())
%timeit inversion_binomial(n, random.random())
%timeit cython_inversion_binomial(n, random.random())
%timeit numpy_binomial(n, random.random())
n = 10000
%timeit naive_binomial(n, random.random())
%timeit inversion_binomial(n, random.random())
%timeit cython_inversion_binomial(n, random.random())
%timeit numpy_binomial(n, random.random())
As we can see, our inversion method (no matter the implementation) runs in $O(n)$ (for $p$ in average $1/2$ in the trials above). But numpy's implementation is using the BTPE method, which runs in $O(1)$.