Policies.AdBandits module¶
The AdBandits bandit algorithm, mixing Thompson Sampling and BayesUCB.
- Reference: [AdBandit: A New Algorithm For Multi-Armed Bandits, F.S.Truzzi, V.F.da Silva, A.H.R.Costa, F.G.Cozman](http://sites.poli.usp.br/p/fabio.cozman/Publications/Article/truzzi-silva-costa-cozman-eniac2013.pdf) 
- Code inspired from: https://github.com/flaviotruzzi/AdBandits/ 
Warning
This policy is very not famous, but for stochastic bandits it works usually VERY WELL! It is not anytime thought.
- 
Policies.AdBandits.random() → x in the interval [0, 1).¶
- 
class Policies.AdBandits.AdBandits(nbArms, horizon=1000, alpha=1, posterior=<class 'Policies.Posterior.Beta.Beta'>, lower=0.0, amplitude=1.0)[source]¶
- Bases: - Policies.BasePolicy.BasePolicy- The AdBandits bandit algorithm, mixing Thompson Sampling and BayesUCB. - Reference: [AdBandit: A New Algorithm For Multi-Armed Bandits, F.S.Truzzi, V.F.da Silva, A.H.R.Costa, F.G.Cozman](http://sites.poli.usp.br/p/fabio.cozman/Publications/Article/truzzi-silva-costa-cozman-eniac2013.pdf) 
- Code inspired from: https://github.com/flaviotruzzi/AdBandits/ 
 - Warning - This policy is very not famous, but for stochastic bandits it works usually VERY WELL! It is not anytime thought. - 
__init__(nbArms, horizon=1000, alpha=1, posterior=<class 'Policies.Posterior.Beta.Beta'>, lower=0.0, amplitude=1.0)[source]¶
- New policy. 
 - 
alpha= None¶
- Parameter alpha 
 - 
horizon= None¶
- Parameter \(T\) = known horizon of the experiment. Default value is 1000. 
 - 
posterior= None¶
- Posterior for each arm. List instead of dict, quicker access 
 - 
property epsilon¶
- Time variating parameter \(\varepsilon(t)\). 
 - 
choice()[source]¶
- With probability \(1 - \varepsilon(t)\), use a Thompson Sampling step, otherwise use a UCB-Bayes step, to choose one arm. 
 - 
choiceWithRank(rank=1)[source]¶
- With probability \(1 - \varepsilon(t)\), use a Thompson Sampling step, otherwise use a UCB-Bayes step, to choose one arm of a certain rank. 
 - 
__module__= 'Policies.AdBandits'¶