Policies.Experimentals.ThompsonRobust module¶
The Thompson (Bayesian) index policy, using an average of 20 index. By default, it uses a Beta posterior. Reference: [Thompson - Biometrika, 1933].
-
Policies.Experimentals.ThompsonRobust.AVERAGEON= 10¶ Default value of how many indexes are computed by sampling the posterior for the ThompsonRobust variant.
-
class
Policies.Experimentals.ThompsonRobust.ThompsonRobust(nbArms, posterior=<class 'Posterior.Beta.Beta'>, averageOn=10, lower=0.0, amplitude=1.0)[source]¶ Bases:
Thompson.ThompsonThe Thompson (Bayesian) index policy, using an average of 20 index. By default, it uses a Beta posterior. Reference: [Thompson - Biometrika, 1933].
-
__init__(nbArms, posterior=<class 'Posterior.Beta.Beta'>, averageOn=10, lower=0.0, amplitude=1.0)[source]¶ Create a new Bayesian policy, by creating a default posterior on each arm.
-
averageOn= None¶ How many indexes are computed before averaging
-
computeIndex(arm)[source]¶ Compute the current index for this arm, by sampling averageOn times the posterior and returning the average index.
At time t and after \(N_k(t)\) pulls of arm k, giving \(S_k(t)\) rewards of 1, by sampling from the Beta posterior and averaging:
\[\begin{split}I_k(t) &= \frac{1}{\mathrm{averageOn}} \sum_{i=1}^{\mathrm{averageOn}} I_k^{(i)}(t), \\ I_k^{(i)}(t) &\sim \mathrm{Beta}(1 + S_k(t), 1 + N_k(t) - S_k(t)).\end{split}\]
-
__module__= 'Policies.Experimentals.ThompsonRobust'¶
-