List of research publications using Lilian Besson’s SMPyBandits project¶
I (Lilian Besson) have started my PhD in October 2016, and this project is a part of my on going research since December 2016.
1st article, about policy aggregation algorithm (aka model selection)¶
I designed and added the Aggregator policy, in order to test its validity and performance.
It is a “simple” voting algorithm to combine multiple bandit algorithms into one.
Basically, it behaves like a simple MAB bandit just based on empirical means (even simpler than UCB), where arms are the child algorithms A_1 .. A_N, each running in “parallel”.
For more details, refer to this file: Aggregation.md and this research article.
PDF : BKM_IEEEWCNC_2018.pdf | HAL notice : BKM_IEEEWCNC_2018 | BibTeX : BKM_IEEEWCNC_2018.bib | Source code and documentation


2nd article, about Multi-players Multi-Armed Bandits¶
There is another point of view: instead of comparing different single-player policies on the same problem, we can make them play against each other, in a multi-player setting.
The basic difference is about collisions : at each time t, if two or more user chose to sense the same channel, there is a collision. Collisions can be handled in different way from the base station point of view, and from each player point of view.
For more details, refer to this file: MultiPlayers.md and this research article.
PDF : BK__ALT_2018.pdf | HAL notice : BK__ALT_2018 | BibTeX : BK__ALT_2018.bib | Source code and documentation


3rd article, using Doubling Trick for Multi-Armed Bandits¶
I studied what Doubling Trick can and can’t do to obtain efficient anytime version of non-anytime optimal Multi-Armed Bandits algorithms.
For more details, refer to this file: DoublingTrick.md and this research article.
PDF : BK__DoublingTricks_2018.pdf | HAL notice : BK__DoublingTricks_2018 | BibTeX : BK__DoublingTricks_2018.bib | Source code and documentation


4th article, about Piece-Wise Stationary Multi-Armed Bandits¶
With Emilie Kaufmann, we studied the Generalized Likelihood Ratio Test (GLRT) for sub-Bernoulli distributions, and proposed the B-GLRT algorithm for change-point detection for piece-wise stationary one-armed bandit problems. We combined the B-GLRT with the kl-UCB multi-armed bandit algorithm and proposed the GLR-klUCB algorithm for piece-wise stationary multi-armed bandit problems. We prove finite-time guarantees for the B-GLRT and the GLR-klUCB algorithm, and we illustrate its performance with extensive numerical experiments.
For more details, refer to this file: NonStationaryBandits.md and this research article.
PDF : BK__COLT_2019.pdf | HAL notice : BK__COLT_2019 | BibTeX : BK__COLT_2019.bib | Source code and documentation


Other interesting things¶
Single-player Policies¶
- More than 65 algorithms, including all known variants of the - UCB, kl-UCB,- MOSSand Thompson Sampling algorithms, as well as other less known algorithms (https://smpybandits.github.io/docs/- OCUCB,- BESA,- OSSBetc).
- SparseWrapperis a generalization of the SparseUCB from this article.
- Implementation of very recent Multi-Armed Bandits algorithms, e.g., - kl-UCB++(from this article),- UCB-dagger(from this article), or- MOSS-anytime(from this article).
- Experimental policies: - BlackBoxOptor- UnsupervisedLearning(using Gaussian processes to learn the arms distributions).
Arms and problems¶
- My framework mainly targets stochastic bandits, with arms following - Bernoulli, bounded (truncated) or unbounded- Gaussian,- Exponential,- Gammaor- Poissondistributions.
- The default configuration is to use a fixed problem for N repetitions (e.g. 1000 repetitions, use - MAB.MAB), but there is also a perfect support for “Bayesian” problems where the mean vector µ1,…,µK change at every repetition (see- MAB.DynamicMAB).
- There is also a good support for Markovian problems, see - MAB.MarkovianMAB, even though I didn’t implement any policies tailored for Markovian problems.
- I’m actively working on adding a very clean support for non-stationary MAB problems, and - MAB.PieceWiseStationaryMABis already working well. Use it with policies designed for piece-wise stationary problems, like Discounted-Thompson, CD-UCB, M-UCB, SW-UCB#.
📜 License ?  ¶
¶
MIT Licensed (file LICENSE).
© 2016-2018 Lilian Besson.
Note: I have worked on other topics during my PhD, you can find my research articles on my website, or have a look to my Google Scholar profile or résumé on HAL.