This notebooks studies and plot the ratio between a sum like the following $$ \sum_{i=0}^{L_T} (T_i - T_{i-1})^\gamma \ln(T_i - T_{i-1})^\delta $$ and the quantity $T^\gamma (\ln(T))^\delta$, where $T \in\mathbb{N}$ is a time horizon of some multi-armed bandit problem, and $\gamma,\delta \geq 0$ but not simultaneously zero.
The successive horizon (in a doubling trick scheme) are defined by $\forall i\in\mathbb{N},\; T_i := \lfloor \exp(\alpha \times \exp(f(i))) \rfloor$, for some function $f: i \mapsto f(i)$.
We study a generic form of functions $f$, with parameters $c,d,e \geq 0$: $f(i) = c (i^d) (\log(i))^e$.
Moreover, we are especially interested in these two cases:
To conclude these quick explanation, the notation $L_T$ is the "last term operator", for a fixed increasing sequence $(T_i)_{i\in\mathbb{N})$:
$$ L_T = \min\{ i \in\mathbb{N}, T_{i-1} < T \leq T_{i} \}.$$
For more details, please check this page on SMPyBandits documentation or this article: [What the Doubling Trick Can or Can’t Do for Multi-Armed Bandits, Lilian Besson and Emilie Kaufmann, 2018].
!pip install watermark
%load_ext watermark
%watermark -v -m -p numpy,matplotlib,seaborn -a "Lilian Besson and Emilie Kaufmann"
from __future__ import division, print_function # Python 2 compatibility
__author__ = "Lilian Besson and Emilie Kaufmann"
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
mpl.rcParams['figure.figsize'] = (16, 9)
Change here if needed, use 128 bits floats instead of 64 bits.
_f = np.float
_f = np.float128
Let's start by defining the functions.
This is some experimental code to plot some doubling sequences and check numerically some inequalities : like controlling a sum $\Sigma_i=0^n u_i$ by a constant times to last term $u_n$ and controlling the last term $u_{L_T}$ as a function of $T$.
#: The constant c in front of the function f.
constant_c_for_the_functions_f = _f(1.0)
constant_c_for_the_functions_f = _f(0.1)
constant_c_for_the_functions_f = _f(0.5)
We recall that we are interested by sequences $(T_i)_i$ that grows about $$T_i = \mathcal{O}(\exp(\alpha \exp(f(i)),$$ and $$f(i) = c \, i^d \, (\log i)^e, \forall i\in\mathbb{N}$$ for $c > 0, d \geq 0, e \geq 0$ and $d, e$ not zero simultaneously.
I don't want to have $\log(T_i - T_{i-1}) = -\infty$ if $T_i = T_{i-1}$ but I want $\log(0) = 0$. Let's hack it!
def mylog(x):
res = np.log(x)
if np.shape(res):
res[np.isinf(res)] = _f(0.0)
else:
if np.isinf(res):
res = _f(0.0)
return res
$$ f(i) = \log(i) \Leftrightarrow T_i = \mathcal{O}(2^i). $$
def function_f__for_geometric_sequences(i, c=constant_c_for_the_functions_f):
r""" For the *geometric* doubling sequences, :math:`f(i) = c \times \log(i)`."""
if i <= 0: return _f(0.0)
return c * mylog(i)
$$ f(i) = i \Leftrightarrow T_i = \mathcal{O}(2^{2^i}). $$
def function_f__for_exponential_sequences(i, c=constant_c_for_the_functions_f):
r""" For the *exponential* doubling sequences, :math:`f(i) = c \times i`."""
return c * i
As soon as $0 < d < 1$, no matter the value of $e$, $T_i$ will essentially faster than any geometric sequence and slower than any exponential sequence.
def function_f__for_generic_sequences(i, c=constant_c_for_the_functions_f, d=_f(0.5), e=_f(0.0)):
r""" For a certain *generic* family of doubling sequences, :math:`f(i) = c \times i^{d} \times (\log(i))^{e}`.
- ``d, e = 0, 1`` gives :func:`function_f__for_geometric_sequences`,
- ``d, e = 1, 0`` gives :func:`function_f__for_geometric_sequences`,
- ``d, e = 0.5, 0`` gives an intermediate sequence, growing faster than any geometric sequence and slower than any exponential sequence,
- any other combination has not been studied yet.
.. warning:: ``d`` should most probably be smaller than 1.
"""
i = float(i)
if i <= 0: return _f(0.0)
if e == 0:
assert d > 0, "Error: invalid value of d = {} for function_f__for_generic_sequences, cannot be <= 0.".format(d) # DEBUG
return c * (i ** d)
assert e > 0, "Error: invalid value of e = {} for function_f__for_generic_sequences, cannot be <= 0.".format(e) # DEBUG
if d == 0:
return c * ((mylog(i)) ** e)
return c * (i ** d) * ((mylog(i)) ** e)
Analytically (mathematically) we started to study $f(i) = \sqrt{i}$.
def function_f__for_intermediate_sequences(i):
return function_f__for_generic_sequences(i, c=constant_c_for_the_functions_f, d=_f(0.5), e=_f(0.0))
And empirically, I'm curious about other sequences, including $f(i) = i^{1/3}$, $f(i) = i^{2/3}$ and $f(i) = \sqrt{i} \sqrt{\log(i)}$.
def function_f__for_intermediate2_sequences(i):
return function_f__for_generic_sequences(i, c=constant_c_for_the_functions_f, d=_f(0.3333), e=_f(0.0))
def function_f__for_intermediate3_sequences(i):
return function_f__for_generic_sequences(i, c=constant_c_for_the_functions_f, d=_f(0.6667), e=_f(0.0))
def function_f__for_intermediate4_sequences(i):
return function_f__for_generic_sequences(i, c=constant_c_for_the_functions_f, d=_f(0.5), e=_f(0.5))
#: Value of the parameter :math:`\alpha` for the :func:`Ti_from_f` function.
alpha_for_Ti = _f(0.1)
alpha_for_Ti = _f(1.0)
alpha_for_Ti = _f(0.5)
We need to define a function that take $f$ and give the corresponding sequence $(T_i)$, given as a function $Ti: i \mapsto T_i$.
def Ti_from_f(f, alpha=alpha_for_Ti, *args, **kwargs):
r""" For any non-negative and increasing function :math:`f: i \mapsto f(i)`, the corresponding sequence is defined by:
.. math:: \forall i\in\mathbb{N},\; T_i := \lfloor \exp(\alpha \times \exp(f(i))) \rfloor.
.. warning:: :math:`f(i)` can need other parameters, see the examples above. They can be given as ``*args`` or ``**kwargs`` to :func:`Ti_from_f`.
.. warning:: it should be computed otherwise, I should give :math:`i \mapsto \exp(f(i))` instead of :math:`f: i \mapsto f(i)`. I need to try as much as possible to reduce the risk of overflow errors!
"""
# WARNING don't forget the floor!
def Ti(i):
this_Ti = np.floor(np.exp(alpha * np.exp(f(float(i), *args, **kwargs))))
if not (np.isinf(this_Ti) or np.isnan(this_Ti)):
this_Ti = int(this_Ti)
# print(" For f = {}, i = {} gives Ti = {}".format(f, i, this_Ti)) # DEBUG
return this_Ti
return Ti
Then we can define the "last term operator", by a naive search (and not an analytic derivation). I don't care if this is not efficient, it works and at least we are sure that $L_T$ satisfies its definition.
def last_term_operator_LT(Ti, max_i=10000):
r""" For a certain function representing a doubling sequence, :math:`T: i \mapsto T_i`, this :func:`last_term_operator_LT` function returns the function :math:`L: T \mapsto L_T`, defined as:
.. math:: \forall T\in\mathbb{N},\; L_T := \min\{ i \in\mathbb{N},\; T \leq T_i \}.
:math:`L_T` is the only integer which satisfies :math:`T_{L_T - 1} < T \leq T_{L_T}`.
"""
def LT(T, max_i=max_i):
i = 0
while Ti(i) < T: # very naive loop!
i += 1
if i >= max_i:
raise ValueError("LT(T={T}) was unable to find a i <= {max_i} such that T_i >= T.".format(T, max_i)) # DEBUG
assert Ti(i - 1) < T <= Ti(i), "Error: i = {} was computed as LT for T = {} and Ti = {} but does not satisfy T_(i-1) < T <= T(i)".format(i, T, Ti) # DEBUG
# print(" For LT: i = {} was computed as LT for T = {} and Ti = {} and satisfies T(i-1) = {} < T <= T(i) = {}".format(i, T, Ti, Ti(i-1), Ti(i))) # DEBUG
return i
return LT
def markers_colors(nb):
"""Make unique markers and colors for nb plots."""
allmarkers = ['o', 'D', 'v', 'p', '<', 's', '^', '*', 'h', '>']
longlist = allmarkers * (1 + int(nb / float(len(allmarkers)))) # Cycle the good number of time
markers = longlist[:nb] # Truncate
colors = sns.hls_palette(nb + 1)[:nb]
return markers, colors
By default, we will study the following doubling sequences:
First, we want to check how much do these sequences increase.
def plot_doubling_sequences(
i_min=1, i_max=30,
list_of_f=(
function_f__for_geometric_sequences,
function_f__for_intermediate_sequences,
function_f__for_intermediate2_sequences,
function_f__for_intermediate3_sequences,
function_f__for_intermediate4_sequences,
function_f__for_exponential_sequences,
),
label_of_f=(
"Geometric doubling (d=0, e=1)",
"Intermediate doubling (d=1/2, e=0)",
"Intermediate doubling (d=1/3, e=0)",
"Intermediate doubling (d=2/3, e=0)",
"Intermediate doubling (d=1/2, e=1/2)",
"Exponential doubling (d=1, e=0)",
),
*args, **kwargs
):
r""" Display a plot to illustrate the values of the :math:`T_i` as a function of :math:`i` for some i.
- Can accept many functions f (and labels).
"""
markers, colors = markers_colors(len(list_of_f))
fig = plt.figure()
i_s = np.arange(i_min, i_max, dtype=np.float128)
# now for each function f
for num_f, (f, la) in enumerate(zip(list_of_f, label_of_f)):
print("\n\nThe {}th function is referred to as {} and is {}".format(num_f, la, f)) # DEBUG
Ti = Ti_from_f(f)
values_of_Ti = [Ti(i) for i in i_s]
plt.plot(i_s, values_of_Ti, label=la, lw=4, ms=7, color=colors[num_f], marker=markers[num_f])
plt.legend()
plt.xlabel(r"Value of the time horizon $i = {},...,{}$".format(i_min, i_max))
plt.title(r"Comparison of the values of $T_i$")
plt.show()
# return fig
def plot_quality_first_upper_bound(
Tmin=2, Tmax=int(1e8), nbTs=100,
gamma=_f(0.0), delta=_f(1.0), # XXX bound in RT <= log(T)
# gamma=_f(0.5), delta=_f(0.0), # XXX bound in RT <= sqrt(T)
# gamma=_f(0.5), delta=_f(0.5), # XXX bound in RT <= sqrt(T * log(T))
# gamma=_f(0.66667), delta=_f(1.0), # XXX another weird bound in RT <= T^2/3 * log(T)
list_of_f=(
function_f__for_geometric_sequences,
function_f__for_intermediate_sequences,
function_f__for_intermediate2_sequences,
function_f__for_intermediate3_sequences,
function_f__for_intermediate4_sequences,
function_f__for_exponential_sequences,
),
label_of_f=(
"Geometric doubling (d=0, e=1)",
"Intermediate doubling (d=1/2, e=0)",
"Intermediate doubling (d=1/3, e=0)",
"Intermediate doubling (d=2/3, e=0)",
"Intermediate doubling (d=1/2, e=1/2)",
"Exponential doubling (d=1, e=0)",
),
show_Ti_m_Tim1=True,
*args, **kwargs
):
r""" Display a plot to compare numerically between the following sum :math:`S` and the upper-bound we hope to have, :math:`T^{\gamma} (\log T)^{\delta}`, as a function of :math:`T` for some values between :math:`T_{\min}` and :math:`T_{\max}`:
.. math:: S := \sum_{i=0}^{L_T} (T_i - T_{i-1})^{\gamma} (\log (T_i - T_{i-1}))^{\delta}.
- Can accept many functions f (and labels).
- Can use :math:`T_i` instead of :math:`T_i - T_{i-1}` if ``show_Ti_m_Tim1=False`` (default is to use the smaller possible bound, with difference of sequence lengths, :math:`T_i - T_{i-1}`).
.. warning:: This is still ON GOING WORK.
"""
markers, colors = markers_colors(len(list_of_f))
fig = plt.figure()
Ts = np.floor(np.linspace(_f(Tmin), _f(Tmax), num=nbTs, dtype=np.float128))
the_bound_we_want = (Ts ** gamma) * (mylog(Ts) ** delta)
# now for each function f
for num_f, (f, la) in enumerate(zip(list_of_f, label_of_f)):
print("\n\nThe {}th function is referred to as {} and is {}".format(num_f, la, f)) # DEBUG
Ti = Ti_from_f(f)
LT = last_term_operator_LT(Ti)
the_sum_we_have = np.zeros_like(Ts, dtype=np.float128)
for j, Tj in enumerate(Ts):
LTj = LT(Tj)
if show_Ti_m_Tim1:
the_sum_we_have[j] = sum(
((Ti(i)-Ti(i-1)) ** gamma) * (mylog((Ti(i)-Ti(i-1))) ** delta)
for i in range(1, LTj + 1)
)
else:
the_sum_we_have[j] = sum(
(Ti(i) ** gamma) * (mylog(Ti(i)) ** delta)
for i in range(0, LTj + 1)
)
# print("For j = {}, Tj = {}, gives LTj = {}, and the value of the sum from i=0 to LTj is = \n{}.".format(j, Tj, LTj, the_sum_we_have[j])) # DEBUG
plt.plot(Ts, the_sum_we_have / the_bound_we_want, label=la, lw=4, ms=4, color=colors[num_f], marker=markers[num_f])
plt.legend()
plt.xlabel(r"Value of the time horizon $T = {},...,{}$".format(Tmin, Tmax))
str_of_Tj_or_dTj = "T_i - T_{i-1}" if show_Ti_m_Tim1 else "T_i"
plt.title(r"Ratio of the sum $\sum_{i=0}^{L_T} (%s)^{\gamma} (\log(%s))^{\delta}$ and the upper-bound $T^{\gamma} \log(T)^{\delta}$, for $\gamma=%.3g$, $\delta=%.3g$." % (str_of_Tj_or_dTj, str_of_Tj_or_dTj, gamma, delta)) # DEBUG
plt.show()
# return fig
We check that the exponential sequence is growing WAY faster than all the others.
plot_doubling_sequences()
And without this faster sequence, just for the first $10$ first sequences:
plot_doubling_sequences(
i_max=10,
list_of_f=(
function_f__for_geometric_sequences,
function_f__for_intermediate_sequences,
function_f__for_intermediate2_sequences,
function_f__for_intermediate3_sequences,
function_f__for_intermediate4_sequences,
),
label_of_f=(
"Geometric doubling (d=0, e=1)",
"Intermediate doubling (d=1/2, e=0)",
"Intermediate doubling (d=1/3, e=0)",
"Intermediate doubling (d=2/3, e=0)",
"Intermediate doubling (d=1/2, e=1/2)",
)
)
And for the three slower sequences, an interesting observation is that the "intermediate" sequence with $f(i) = i^{1/3}$ is comparable to $f(i) = \log(i)$ (geometric one) for small values!
plot_doubling_sequences(
i_max=14,
list_of_f=(
function_f__for_geometric_sequences,
function_f__for_intermediate_sequences,
function_f__for_intermediate2_sequences,
),
label_of_f=(
"Geometric doubling (d=0, e=1)",
"Intermediate doubling (d=1/2, e=0)",
"Intermediate doubling (d=1/3, e=0)",
)
)
gamma, delta = (_f(0.0), _f(1.0))
plot_quality_first_upper_bound(Tmax=int(1e3), gamma=gamma, delta=delta, show_Ti_m_Tim1=True)
plot_quality_first_upper_bound(gamma=gamma, delta=delta, show_Ti_m_Tim1=False)
Conclusion:
gamma, delta = (_f(0.5), _f(0.0))
plot_quality_first_upper_bound(gamma=gamma, delta=delta, show_Ti_m_Tim1=True)
plot_quality_first_upper_bound(gamma=gamma, delta=delta, show_Ti_m_Tim1=False)
Conclusion:
But the proof has to be careful and use a smart bound on $T_i - T_{i-1} \leq \text{cste} \times b^{i-1}$ and not with a $b^i$ (see difference between first and second plot)
gamma, delta = (_f(0.5), _f(0.5))
plot_quality_first_upper_bound(gamma=gamma, delta=delta, show_Ti_m_Tim1=True)
plot_quality_first_upper_bound(gamma=gamma, delta=delta, show_Ti_m_Tim1=False)
Conclusion:
But the proof has to be careful and use a smart bound on $T_i - T_{i-1} \leq \text{cste} \times b^{i-1}$ and not with a $b^i$ (see difference between first and second plot)
gamma, delta = (_f(0.66667), _f(1.0))
plot_quality_first_upper_bound(gamma=gamma, delta=delta, show_Ti_m_Tim1=True)
plot_quality_first_upper_bound(gamma=gamma, delta=delta, show_Ti_m_Tim1=False)
Conclusion:
But the proof has to be careful and use a smart bound on $T_i - T_{i-1} \leq \text{cste} \times b^{i-1}$ and not with a $b^i$ (see difference between first and second plot)
The take home messages are the following:
That's it for today!