Policies.DoublingTrickWrapper module¶
A policy that acts as a wrapper on another policy P, assumed to be horizon dependent (has to known \(T\)), by implementing a “doubling trick”:
starts to assume that \(T=T_0=1000\), and run the policy \(P(T_0)\), from \(t=1\) to \(t=T_0\),
if \(t > T_0\), then the “doubling trick” is performed, by either re-initializing or just changing the parameter horizon of the policy P, for instance with \(T_2 = 10 \times T_0\),
and keep doing this until \(t = T\).
Note
This is implemented in a very generic way, with simply a function next_horizon(horizon) that gives the next horizon to try when crossing the current guess. It can be a simple linear function (next_horizon(horizon) = horizon + 100), a geometric growth to have the “real” doubling trick (next_horizon(horizon) = horizon * 10), or even functions growing exponentially fast (next_horizon(horizon) = horizon ** 1.1, next_horizon(horizon) = horizon ** 1.5, next_horizon(horizon) = horizon ** 2).
Note
My guess is that this “doubling trick” wrapping policy can only be efficient (for stochastic problems) if:
the underlying policy P is a very efficient horizon-dependent algorithm, e.g., the
Policies.ApproximatedFHGittins,the growth function next_horizon is growing faster than any geometric rate, so that the number of refresh is \(o(\log T)\) and not \(O(\log T)\).
See also
Reference: [[What the Doubling Trick Can or Can’t Do for Multi-Armed Bandits, Lilian Besson and Emilie Kaufmann, 2018]](https://hal.inria.fr/hal-01736357), to be presented soon.
Warning
Interface: If FULL_RESTART=False (default), the underlying algorithm is recreated at every breakpoint, instead its attribute horizon or _horizon is updated. Be sure that this is enough to really change the internal value used by the policy. Some policy use T only once to compute others parameters, which should be updated as well. A manual implementation of the __setattr__ method can help.
-
Policies.DoublingTrickWrapper.default_horizonDependent_policy¶ alias of
Policies.UCBH.UCBH
-
Policies.DoublingTrickWrapper.FULL_RESTART= False¶ Default constant to know what to do when restarting the underlying policy with a new horizon parameter.
True means that a new policy, initialized from scratch, will be created at every breakpoint.
False means that the same policy object is used but just its attribute horizon is updated (default).
-
Policies.DoublingTrickWrapper.DEFAULT_FIRST_HORIZON= 200¶ Default horizon, used for the first step.
-
Policies.DoublingTrickWrapper.ARITHMETIC_STEP= 200¶ Default stepsize for the arithmetic horizon progression.
-
Policies.DoublingTrickWrapper.next_horizon__arithmetic[source]¶ The arithmetic horizon progression function:
\[\begin{split}T &\mapsto T + 100,\\ T_i &:= T_0 + 100 \times i.\end{split}\]
-
Policies.DoublingTrickWrapper.GEOMETRIC_STEP= 2¶ Default multiplicative constant for the geometric horizon progression.
-
Policies.DoublingTrickWrapper.next_horizon__geometric[source]¶ The geometric horizon progression function:
\[\begin{split}T &\mapsto T \times 2,\\ T_i &:= T_0 2^i.\end{split}\]
-
Policies.DoublingTrickWrapper.EXPONENTIAL_STEP= 1.5¶ Default exponential constant for the exponential horizon progression.
-
Policies.DoublingTrickWrapper.next_horizon__exponential[source]¶ The exponential horizon progression function:
\[\begin{split}T &\mapsto \left\lfloor T^{1.5} \right\rfloor,\\ T_i &:= \left\lfloor T_0^{1.5^i} \right\rfloor.\end{split}\]
-
Policies.DoublingTrickWrapper.SLOW_EXPONENTIAL_STEP= 1.1¶ Default exponential constant for the slow exponential horizon progression.
-
Policies.DoublingTrickWrapper.next_horizon__exponential_slow[source]¶ The exponential horizon progression function:
\[\begin{split}T &\mapsto \left\lfloor T^{1.1} \right\rfloor,\\ T_i &:= \left\lfloor T_0^{1.1^i} \right\rfloor.\end{split}\]
-
Policies.DoublingTrickWrapper.FAST_EXPONENTIAL_STEP= 2¶ Default exponential constant for the fast exponential horizon progression.
-
Policies.DoublingTrickWrapper.next_horizon__exponential_fast[source]¶ The exponential horizon progression function:
\[\begin{split}T &\mapsto \lfloor T^{2} \rfloor,\\ T_i &:= \lfloor T_0^{2^i} \rfloor.\end{split}\]
-
Policies.DoublingTrickWrapper.ALPHA= 2¶ Default constant \(\alpha\) for the generic exponential sequence.
-
Policies.DoublingTrickWrapper.BETA= 2¶ Default constant \(\beta\) for the generic exponential sequence.
-
Policies.DoublingTrickWrapper.next_horizon__exponential_generic(i, horizon)[source]¶ The generic exponential horizon progression function:
\[T_i := \left\lfloor \frac{T_0}{a} a^{b^i} \right\rfloor.\]
-
Policies.DoublingTrickWrapper.default_next_horizon¶ The exponential horizon progression function:
\[\begin{split}T &\mapsto \left\lfloor T^{1.1} \right\rfloor,\\ T_i &:= \left\lfloor T_0^{1.1^i} \right\rfloor.\end{split}\]
-
Policies.DoublingTrickWrapper.breakpoints(next_horizon, first_horizon, horizon, debug=False)[source]¶ Return the list of restart point (breakpoints), if starting from
first_horizontohorizonwith growth functionnext_horizon.Also return the gap between the last guess for horizon and the true horizon. This gap should not be too large.
Nicely print all the values if
debug=True.First examples:
>>> first_horizon = 1000 >>> horizon = 30000 >>> breakpoints(next_horizon__arithmetic, first_horizon, horizon) ([1000, 1200, 1400, ..., 29800, 30000], 0) >>> breakpoints(next_horizon__geometric, first_horizon, horizon) ([1000, 2000, 4000, 8000, 16000, 32000], 2000) >>> breakpoints(next_horizon__exponential, first_horizon, horizon) ([1000, 31622], 1622) >>> breakpoints(next_horizon__exponential_slow, first_horizon, horizon) ([1000, 1995, 4265, 9838, 24671, 67827], 37827) >>> breakpoints(next_horizon__exponential_fast, first_horizon, horizon) ([1000, 1000000], 970000)
Second examples:
>>> first_horizon = 5000 >>> horizon = 1000000 >>> breakpoints(next_horizon__arithmetic, first_horizon, horizon) ([5000, 5200, ..., 999600, 999800, 1000000], 0) >>> breakpoints(next_horizon__geometric, first_horizon, horizon) ([5000, 10000, 20000, 40000, 80000, 160000, 320000, 640000, 1280000], 280000) >>> breakpoints(next_horizon__exponential, first_horizon, horizon) ([5000, 353553, 210223755], 209223755) >>> breakpoints(next_horizon__exponential_slow, first_horizon, horizon) ([5000, 11718, 29904, 83811, 260394, 906137, 3572014], 2572014) >>> breakpoints(next_horizon__exponential_fast, first_horizon, horizon) ([5000, 25000000], 24000000)
Third examples:
>>> first_horizon = 10 >>> horizon = 1123456 >>> breakpoints(next_horizon__arithmetic, first_horizon, horizon) ([10, 210, 410, ..., 1123210, 1123410, 1123610], 154) >>> breakpoints(next_horizon__geometric, first_horizon, horizon) ([10, 20, 40, 80, 160, 320, 640, 1280, 2560, 5120, 10240, 20480, 40960, 81920, 163840, 327680, 655360, 1310720], 187264) >>> breakpoints(next_horizon__exponential, first_horizon, horizon) ([10, 31, 172, 2255, 107082, 35040856], 33917400) >>> breakpoints(next_horizon__exponential_slow, first_horizon, horizon) ([10, 12, 15, 19, 25, 34, 48, 70, 107, 170, 284, 499, 928, 1837, 3895, 8903, 22104, 60106, 180638, 606024, 2294768], 1171312) >>> breakpoints(next_horizon__exponential_fast, first_horizon, horizon) ([10, 100, 10000, 100000000], 98876544)
-
Policies.DoublingTrickWrapper.constant_c_for_the_functions_f= 0.5¶ The constant c in front of the function f.
-
Policies.DoublingTrickWrapper.function_f__for_geometric_sequences(i, c=0.5)[source]¶ For the geometric doubling sequences, \(f(i) = c \times \log(i)\).
-
Policies.DoublingTrickWrapper.function_f__for_exponential_sequences(i, c=0.5)[source]¶ For the exponential doubling sequences, \(f(i) = c \times i\).
-
Policies.DoublingTrickWrapper.function_f__for_generic_sequences(i, c=0.5, d=0.5, e=0.0)[source]¶ For a certain generic family of doubling sequences, \(f(i) = c \times i^{d} \times (\log(i))^{e}\).
d, e = 0, 1givesfunction_f__for_geometric_sequences(),d, e = 1, 0givesfunction_f__for_geometric_sequences(),d, e = 0.5, 0gives an intermediate sequence, growing faster than any geometric sequence and slower than any exponential sequence,any other combination has not been studied yet.
Warning
dshould most probably be smaller than 1.
-
Policies.DoublingTrickWrapper.alpha_for_Ti= 0.5¶ Value of the parameter \(\alpha\) for the
Ti_from_f()function.
-
Policies.DoublingTrickWrapper.Ti_from_f(f, alpha=0.5, *args, **kwargs)[source]¶ For any non-negative and increasing function \(f: i \mapsto f(i)\), the corresponding sequence is defined by:
\[\forall i\in\mathbb{N},\; T_i := \lfloor \exp(\alpha \times \exp(f(i))) \rfloor.\]Warning
\(f(i)\) can need other parameters, see the examples above. They can be given as
*argsor**kwargstoTi_from_f().Warning
it should be computed otherwise, I should give \(i \mapsto \exp(f(i))\) instead of \(f: i \mapsto f(i)\). I need to try as much as possible to reduce the risk of overflow errors!
-
Policies.DoublingTrickWrapper.Ti_geometric(i, horizon, alpha=0.5, first_horizon=200, *args, **kwargs)[source]¶ Sequence \(T_i\) generated from the function \(f\) =
function_f__for_geometric_sequences().
-
Policies.DoublingTrickWrapper.Ti_exponential(i, horizon, alpha=0.5, first_horizon=200, *args, **kwargs)[source]¶ Sequence \(T_i\) generated from the function \(f\) =
function_f__for_exponential_sequences().
-
Policies.DoublingTrickWrapper.Ti_intermediate_sqrti(i, horizon, alpha=0.5, first_horizon=200, *args, **kwargs)[source]¶ Sequence \(T_i\) generated from the function \(f\) =
function_f__for_intermediate_sequences().
-
Policies.DoublingTrickWrapper.Ti_intermediate_i13(i, horizon, alpha=0.5, first_horizon=200, *args, **kwargs)[source]¶ Sequence \(T_i\) generated from the function \(f\) =
function_f__for_intermediate2_sequences().
-
Policies.DoublingTrickWrapper.Ti_intermediate_i23(i, horizon, alpha=0.5, first_horizon=200, *args, **kwargs)[source]¶ Sequence \(T_i\) generated from the function \(f\) =
function_f__for_intermediate3_sequences().
-
Policies.DoublingTrickWrapper.Ti_intermediate_i12_logi12(i, horizon, alpha=0.5, first_horizon=200, *args, **kwargs)[source]¶ Sequence \(T_i\) generated from the function \(f\) =
function_f__for_intermediate4_sequences().
-
Policies.DoublingTrickWrapper.Ti_intermediate_i_by_logi(i, horizon, alpha=0.5, first_horizon=200, *args, **kwargs)[source]¶ Sequence \(T_i\) generated from the function \(f\) =
function_f__for_intermediate5_sequences().
-
Policies.DoublingTrickWrapper.last_term_operator_LT(Ti, max_i=10000)[source]¶ For a certain function representing a doubling sequence, \(T: i \mapsto T_i\), this
last_term_operator_LT()function returns the function \(L: T \mapsto L_T\), defined as:\[\forall T\in\mathbb{N},\; L_T := \min\{ i \in\mathbb{N},\; T \leq T_i \}.\]\(L_T\) is the only integer which satisfies \(T_{L_T - 1} < T \leq T_{L_T}\).
-
Policies.DoublingTrickWrapper.plot_doubling_sequences(i_min=1, i_max=30, list_of_f=(<function function_f__for_geometric_sequences>, <function function_f__for_intermediate_sequences>, <function function_f__for_intermediate2_sequences>, <function function_f__for_intermediate3_sequences>, <function function_f__for_intermediate4_sequences>, <function function_f__for_exponential_sequences>), label_of_f=('Geometric doubling (d=0, e=1)', 'Intermediate doubling (d=1/2, e=0)', 'Intermediate doubling (d=1/3, e=0)', 'Intermediate doubling (d=2/3, e=0)', 'Intermediate doubling (d=1/2, e=1/2)', 'Exponential doubling (d=1, e=0)'), *args, **kwargs)[source]¶ Display a plot to illustrate the values of the \(T_i\) as a function of \(i\) for some i.
Can accept many functions f (and labels).
-
Policies.DoublingTrickWrapper.plot_quality_first_upper_bound(Tmin=10, Tmax=100000000, nbTs=100, gamma=0.0, delta=1.0, list_of_f=(<function function_f__for_geometric_sequences>, <function function_f__for_intermediate_sequences>, <function function_f__for_intermediate2_sequences>, <function function_f__for_intermediate3_sequences>, <function function_f__for_intermediate4_sequences>, <function function_f__for_exponential_sequences>), label_of_f=('Geometric doubling (d=0, e=1)', 'Intermediate doubling (d=1/2, e=0)', 'Intermediate doubling (d=1/3, e=0)', 'Intermediate doubling (d=2/3, e=0)', 'Intermediate doubling (d=1/2, e=1/2)', 'Exponential doubling (d=1, e=0)'), show_Ti_m_Tim1=True, *args, **kwargs)[source]¶ Display a plot to compare numerically between the following sum \(S\) and the upper-bound we hope to have, \(T^{\gamma} (\log T)^{\delta}\), as a function of \(T\) for some values between \(T_{\min}\) and \(T_{\max}\):
\[S := \sum_{i=0}^{L_T} (T_i - T_{i-1})^{\gamma} (\log (T_i - T_{i-1}))^{\delta}.\]Can accept many functions f (and labels).
Can use \(T_i\) instead of \(T_i - T_{i-1}\) if
show_Ti_m_Tim1=False(default is to use the smaller possible bound, with difference of sequence lengths, \(T_i - T_{i-1}\)).
Warning
This is still ON GOING WORK.
-
Policies.DoublingTrickWrapper.MAX_NB_OF_TRIALS= 500¶ If the sequence Ti does not grow enough, artificially increase i until T_inext > T_i
-
class
Policies.DoublingTrickWrapper.DoublingTrickWrapper(nbArms, full_restart=False, policy=<class 'Policies.UCBH.UCBH'>, next_horizon=CPUDispatcher(<function next_horizon__exponential_slow>), first_horizon=200, *args, **kwargs)[source]¶ Bases:
Policies.BaseWrapperPolicy.BaseWrapperPolicyA policy that acts as a wrapper on another policy P, assumed to be horizon dependent (has to known \(T\)), by implementing a “doubling trick”.
Reference: [[What the Doubling Trick Can or Can’t Do for Multi-Armed Bandits, Lilian Besson and Emilie Kaufmann, 2018]](https://hal.inria.fr/hal-01736357), to be presented soon.
-
__init__(nbArms, full_restart=False, policy=<class 'Policies.UCBH.UCBH'>, next_horizon=CPUDispatcher(<function next_horizon__exponential_slow>), first_horizon=200, *args, **kwargs)[source]¶ New policy.
-
full_restart= None¶ Constant to know how to refresh the underlying policy.
-
next_horizon_name= None¶ Pretty string of the name of this growing function
-
__module__= 'Policies.DoublingTrickWrapper'¶
-
horizon= None¶ Last guess for the horizon