all-phd-thesis

[Agr95]	R. Agrawal (1995). Sample mean based index policies by O(logn) regret for the Multi-Armed Bandit problem. Advances in Applied Probability, 27(4):1054--1078. [ bib ]
[ALK19]	P. Alatur, K. Y. Levy, and A. Krause (2019). Multi-Player Bandits: The Adversarial Case. arXiv preprint arXiv:1902.08036. https://arxiv.org/abs/1902.08036. [ bib \| http ]
[AFM17]	R. Allesiardo, R. Féraud, and O.-A. Maillard (2017). The Non-Stationary Stochastic Multi-Armed Bandit Problem. International Journal of Data Science and Analytics, 3(4):267--283. [ bib ]
[AMTA11]	A. Anandkumar, N. Michael, A. K. Tang, and S. Agrawal (2011). Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret. Journal on Selected Areas in Communications, 29(4):731--745. [ bib ]
[AVW87a]	V. Anantharam, P. Varaiya, and J. Walrand (1987). Asymptotically efficient allocation rules for the Multi-Armed Bandit problem with multiple plays - Part I: IID rewards. Transactions on Automatic Control, 32(11):968--976. [ bib ]
[AVW87b]	V. Anantharam, P. Varaiya, and J. Walrand (1987). Asymptotically efficient allocation rules for the Multi-Armed Bandit problem with multiple plays - Part II: Markovian rewards. Transactions on Automatic Control, 32(11):977--982. [ bib ]
[AHK12]	S. Arora, E. Hazan, and S. Kale (2012). The Multiplicative Weights Update Method: a Meta-Algorithm and Applications. Theory of Computing, 8(1):121--164. [ bib ]
[AE61]	K. J. Arrow and A. C. Enthoven (1961). Quasi-Concave Programming. Econometrica, 29(4):779--800. [ bib ]
[ACBF02]	P. Auer, N. Cesa-Bianchi, and P. Fischer (2002). Finite-time Analysis of the Multi-armed Bandit Problem. Machine Learning, 47(2):235--256. [ bib \| DOI ]
[ACBFS02]	P. Auer, N. Cesa-Bianchi, Y. Freund, and R. Schapire (2002). The Non-Stochastic Multi-Armed Bandit Problem. SIAM journal on computing, 32(1):48--77. [ bib ]
[AO10]	P. Auer and R. Ortner (2010). UCB Revisited: Improved Regret Bounds For The Stochastic Multi-Armed Bandit Problem. Periodica Mathematica Hungarica, 61(1-2):55--65. [ bib ]
[AGO18]	P. Auer, P. Gajane, and R. Ortner (2018). Adaptively Tracking the Best Arm with an Unknown Number of Distribution Changes. European Workshop on Reinforcement Learning. https://ewrl.files.wordpress.com/2018/09/ewrl_14_2018_paper_28.pdf. [ bib \| .pdf ]
[AM15]	O. Avner and S. Mannor (2015). Learning to Coordinate Without Communication in Multi-User Multi-Armed Bandit Problems. arXiv preprint arXiv:1504.08167. https://arxiv.org/abs/1504.08167. [ bib \| http ]
[AM18]	O. Avner and S. Mannor (2018). Multi-User Communication Networks: A Coordinated Multi-Armed Bandit Approach. arXiv preprint arXiv:1808.04875. https://arxiv.org/abs/1808.04875. [ bib \| http ]
[BMM14]	A. Baransi, O.-A. Maillard, and S. Mannor (2014). Sub-sampling for Multi-armed Bandits. Proceedings of the European Conference on Machine Learning. https://hal.archives-ouvertes.fr/hal-01025651. [ bib \| http ]
[BK19a]	L. Besson and E. Kaufmann (August 2019). Analyse non asymptotique d'un test séquentiel de détection de ruptures et application aux bandits non stationnaires. GRETSI. https://hal.archives-ouvertes.fr/hal-02152243, `https://hal.archives-ouvertes.fr/hal-02152243`. [ bib \| http \| .pdf ]
[BMP18]	R. Bonnefoi, C. Moy, and J. Palicot (2018). Improvement of the LPWAN AMI backhaul's latency thanks to reinforcement learning algorithms. EURASIP Journal on Wireless Communications and Networking, 2018(1):34. [ bib \| DOI ]
[BP18]	E. Boursier and V. Perchet (2018). SIC-MMAB: Synchronisation Involves Communication in Multiplayer Multi-Armed Bandits. arXiv preprint arXiv:1809.08151. https://arxiv.org/abs/1809.08151. [ bib \| http ]
[BCB12]	S. Bubeck and N. Cesa-Bianchi (2012). Regret Analysis of Stochastic and Non-Stochastic Multi-Armed Bandit Problems. Foundations and Trends in Machine Learning, 5(1):1--122. [ bib ]
[BK96]	A. N. Burnetas and M. N. Katehakis (1996). Optimal Adaptive Policies for Sequential Allocation Problems. Advances in Applied Mathematics, 17(2):122--142. [ bib ]
[CVZZ16]	M. Centenaro, L. Vangelista, A. Zanella, and M. Zorzi (2016). Long-range communications in unlicensed bands: the rising stars in the IoT and smart city scenarios. Wireless Communications, 23(5):60--67. [ bib \| DOI ]
[CMR14]	O. Chapelle, E. Manavoglu, and R. Rosales (2014). Simple and Scalable Response Prediction For Display Advertising. Transactions on Intelligent Systems and Technology. [ bib ]
[DMP16]	S. J. Darak, C. Moy, and J. Palicot (2016). Proof-of-Concept System for Opportunistic Spectrum Access in Multi-user Decentralized Networks. EAI Endorsed Transactions on Cognitive Communications, 2:1--10. [ bib ]
[DH18]	S. J. Darak and M. K. Hanawal (2018). Distributed Learning and Stable Orthogonalization in Ad-Hoc Networks with Heterogeneous Channels. arXiv preprint arXiv:1812.11651. https://arxiv.org/abs/1812.11651. [ bib \| http ]
[GBV18]	G. Gautier, R. Bardenet, and M. Valko (2018). DPPy: Sampling Determinantal Point Processes with Python. arXiv preprint arXiv:1809.07258. https://arxiv.org/abs/1809.07258, code at `https://github.com/guilgautier/DPPy`. Documentation at `https://dppy.readthedocs.io`. [ bib \| http ] Keywords: Computer Science - Machine Learning, Computer Science - Mathematical Software, Statistics - Machine Learning
[GMS16]	A. Garivier, P. Ménard, and G. Stoltz (2016). Explore First, Exploit Next: The True Shape of Regret in Bandit Problems. arXiv preprint arXiv:1602.07182. https://arxiv.org/abs/1602.07182. [ bib \| http ]
[GHMS18]	A. Garivier, H. Hadiji, P. Menard, and G. Stoltz (2018). KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints. arXiv preprint arXiv:1805.05071. https://arxiv.org/abs/1805.05071. [ bib \| http ]
[Hay05]	S. Haykin (2005). Cognitive Radio: Brain-Empowered Wireless Communications. Journal on Selected Areas in Communications, 23(2):201--220. [ bib ]
[H⁺16]	E. Hazan et al. (2016). Introduction to online convex optimization. Foundations and Trends in Optimization, 2(3-4):157--325. [ bib ]
[Hon19]	J. Honda (2019). A Note on KL-UCB+ Policy for the Stochastic Bandit. arXiv preprint arXiv:1903.07839. https://arxiv.org/abs/1903.07839. [ bib \| http ]
[JMP12]	W. Jouini, C. Moy, and J. Palicot (2012). Decision Making for Cognitive Radio Equipment: Analysis of the First 10 Years of Exploration. EURASIP Journal on Wireless Communications and Networking, 2012(1). [ bib ]
[KK18]	E. Kaufmann and W. M. Koolen (2018). Mixture Martingales Revisited with Applications to Sequential Tests and Confidence Intervals. arXiv preprint arXiv:1811.11419. https://arXiv.org/abs/1811.11419. [ bib \| http ]
[CGM⁺13]	O. Cappé, A. Garivier, O.-A. Maillard, R. Munos, and G. Stoltz (2013). Kullback-Leibler Upper Confidence Bounds For Optimal Sequential Allocation. Annals of Statistics, 41(3):1516--1541. [ bib ]
[KG17]	E. Kaufmann and A. Garivier (2017). Learning The Distribution With Largest Mean: Two Bandit Frameworks. arXiv preprint arXiv:1702.00001. https://arxiv.org/abs/1702.00001. [ bib \| http ]
[KM19]	E. Kaufmann and A. Mehrabian (2019). New Algorithms for Multiplayer Bandits when Arm Means Vary Among Players. arXiv preprint arXiv:1902.01239. https://arxiv.org/abs/1902.01239. [ bib \| http ]
[KDI18]	N. Keriven, G. Damien, and P. Iacopo (2018). NEWMA: a new method for scalable model-free online change-point detection. arXiv preprint arXiv:1805.08061. https://arxiv.org/abs/1805.08061, code at `https://github.com/lightonai/newma`. [ bib \| http ]
[KT19]	B. Kim and A. Tewari (2019). On the Optimality of Perturbations in Stochastic and Adversarial Multi-Armed Bandit Problems. arXiv preprint arXiv:1902.00610. https://arxiv.org/abs/1902.00610. [ bib \| http ]
[KL51]	S. Kullback and R.A. Leibler (1951). On Information and Sufficiency. The Annals of Mathematical Statistics, 22(1):79--86. [ bib ]
[KDH⁺19]	R. Kumar, S. J. Darak, M. K. Hanawal, A. K. Sharma, and R. K. Tripathi (2019). Distributed Algorithm for Learning to Coordinate in Infrastructure-Less Network. IEEE Communications Letters, 23(2):362--365. ISSN 1089-7798. [ bib \| DOI ]
[LR85]	T. L. Lai and H. Robbins (1985). Asymptotically Efficient Adaptive Allocation Rules. Advances in Applied Mathematics, 6(1):4--22. [ bib ]
[LX10]	T. L. Lai and H. Xing (2010). Sequential change-point detection when the pre-and post-change parameters are unknown. Sequential Analysis, 29(2):162--175. [ bib ]
[Lat16b]	T. Lattimore (2016). Regret Analysis of the Anytime Optimally Confident UCB Algorithm. arXiv preprint arXiv:1603.08661. https://arxiv.org/abs/1603.08661. [ bib \| http ]
[Lat18]	T. Lattimore (2018). Refining the confidence level for optimistic bandit strategies. The Journal of Machine Learning Research, 19(1):765--796. [ bib ]
[LJ18]	L. Li and K. Jamieson (2018). Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization. Journal of Machine Learning Research, 18:1--52. https://arxiv.org/abs/1603.06560. [ bib \| http ]
[LKC17]	A. Luedtke, E. Kaufmann, and A. Chambaz (2017). Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits. Machine Learning, pages 1--31. https://arxiv.org/abs/1606.09388. [ bib \| http ]
[Lue68]	D. G. Luenberger (1968). Quasi-Convex Programming. SIAM Journal on Applied Mathematics, 16(5):1090--1095. [ bib ]
[LM18]	G. Lugosi and A. Mehrabian (2018). Multiplayer Bandits Without Observing Collision Information. arXiv preprint arXiv:1808.08416. https://arxiv.org/abs/1808.08416. [ bib \| http ]
[MH16]	S. Maghsudi and E. Hossain (2016). Multi-Armed Bandits with application to 5G small cells. Wireless Communications, 23(3):64--73. [ bib \| DOI ]
[MGMM⁺15]	L. Melián-Gutiérrez, N. Modi, C. Moy, F. Bader, I. Pérez-Álvarez, and S. Zazo (2015). Hybrid UCB-HMM: A Machine Learning Strategy for Cognitive Radio in HF Band. IEEE Transactions on Cognitive Communications and Networking, 1(3):347--358. [ bib ]
[MM99]	J. Mitola and G. Q. Maguire (1999). Cognitive Radio: making software radios more personal. Personal Communications, 6(4):13--18. [ bib ]
[MMM17]	N. Modi, P. Mary, and C. Moy (2017). QoS driven Channel Selection Algorithm for Cognitive Radio Network: Multi-User Multi-Armed Bandit Approach. Transactions on Cognitive Communications and Networking, 3(1):49--66. [ bib ]
[Nie11]	F. Nielsen (2011). Chernoff Information of Exponential Families. arXiv preprint arXiv:1102.2684. https://arxiv.org/abs/1102.2684. [ bib \| http ]
[PGNN19]	V. Patil, G. Ghalme, V. Nair, and Y. Narahari (2019). Stochastic Multi-Armed Bandits with Arm-specific Fairness Guarantees. arXiv preprint arXiv:1905.11260. https://arxiv.org/abs/1905.11260. [ bib \| http ]
[RKS17]	U. Raza, P. Kulkarni, and M. Sooriyabandara (2017). Low Power Wide Area Networks (LPWAN): An Overview. Communications Surveys Tutorials, 19(2):855--873. [ bib \| DOI ]
[Rob52]	H. Robbins (1952). Some Aspects of the Sequential Design of Experiments. Bulletin of the American Mathematical Society, 58(5):527--535. [ bib ]
[Rob75]	L. G. Roberts (1975). ALOHA Packet System With and Without Slots and Capture. SIGCOMM Computer Communication Review, 5(2):28--42. [ bib ]
[SLC⁺19]	J. Seznec, A. Locatelli, A. Carpentier, A. Lazaric, and M. Valko (2019). Rotting Bandits Are No Harder Than Stochastic Ones. International Conference on Artificial Intelligence and Statistics. https://arxiv.org/abs/1811.11043. [ bib \| http ]
[AHK17]	S. Adish, H. Hassani, and A. Krause (2017). Learning to Use Learners' Advice. arXiv preprint arXiv:1702.04825. https://arxiv.org/abs/1702.04825. [ bib \| http ]
[Sli19]	A. Slivkins (June 2019). Introduction to Multi-Armed Bandits. arXiv preprint arXiv:1904.07272v3. https://arxiv.org/abs/1904.07272v3. [ bib \| http ]
[Tho33]	W. R. Thompson (1933). On the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two Samples. Biometrika, 25. [ bib ]
[Z. 19]	Z. Tian and J. Wang and J. Wang and J. Song (2019). Distributed NOMA-Based Multi-Armed Bandit Approach for Channel Access in Cognitive Radio Networks. IEEE Wireless Communications Letters, pages 1--4. ISSN 2162-2337. [ bib \| DOI ]
[TdSCC13]	F. S. Truzzi, V. F. da Silva, A. H. Reali Costa, and F. Gagliardi Cozman (2013). AdBandit: a New Algorithm for Multi-Armed Bandits. ENIAC, 2013(1). [ bib ]
[Wal45]	A. Wald (1945). Some Generalizations of the Theory of Cumulative Sums of Random Variables. The Annals of Mathematical Statistics, 16(3):287--293. [ bib ]
[Whi88]	P. Whittle (1988). Restless bandits: Activity allocation in a changing world. Journal of Applied Probability, 25(A):287--298. [ bib ]
[WCN⁺19]	F. Wilhelmi, C. Cano, G. Neu, B. Bellalta, A. Jonsson, and S. Barrachina-Muñoz (2019). Collaborative Spatial Reuse In Wireless Networks Via Selfish Multi-Armed Bandits. Ad Hoc Networks. [ bib ]
[WBMB⁺19]	F. Wilhelmi, S. Barrachina-Muñoz, B. Bellalta, C. Cano, A. Jonsson, and G. Neu (2019). Potential and Pitfalls of Multi-Armed Bandits for Decentralized Spatial Reuse in WLANs. Journal of Network and Computer Applications, 127:26--42. [ bib ]
[Wil38]	S. S. Wilks (1938). The large-sample distribution of the likelihood ratio for testing composite hypotheses. The Annals of Mathematical Statistics, 9(1):60--62. [ bib ]
[Yaa77]	M. E. Yaari (1977). A Note on Separability and Quasiconcavity. Econometrica, 45(5):1183--1186. [ bib ]
[ZS07]	Q. Zhao and B. M. Sadler (2007). A Survey of Dynamic Spectrum Access. Signal Processing magazine, 24(3):79--89. [ bib ]
[LZ10]	K. Liu and Q. Zhao (2010). Distributed Learning in Multi-Armed Bandit with Multiple Players. Transaction on Signal Processing, 58(11):5667--5681. [ bib ]
[ALVM06]	I. F. Akyildiz, W.-Y. Lee, M. C. Vuran, and S. Mohanty (2006). NeXt Generation, Dynamic Spectrum Access, Cognitive Radio Wireless Networks: A Survey. Computer Networks, 50(13):2127--2159. [ bib ]
[AB10]	J.-Y. Audibert and S. Bubeck (2010). Regret Bounds And Minimax Policies Under Partial Monitoring. Journal of Machine Learning Research, 11:2785--2836. [ bib ]
[Bar59]	G.A. Barnard (1959). Control charts and stochastic processes. Journal of the Royal Statistical Society. Series B (Methodological), pages 239--271. [ bib ]
[BR19]	D. Bouneffouf and I. Rish (2019). A Survey on Practical Applications of Multi-Armed and Contextual Bandits. arXiv preprint arXiv:1904.10040, under review by IJCAI 2019 Survey. https://arxiv.org/abs/1904.10040. [ bib \| http ]
[C⁺52]	H. Chernoff et al. (1952). A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on The Sum of Observations. The Annals of Mathematical Statistics, 23(4):493--507. [ bib ]
[Che81]	H. Chernoff (1981). A Note on an Inequality Involving the Normal Distribution. The Annals of Probability, pages 533--535. [ bib ]
[GHRZ19]	Z. Gao, Y. Han, Z. Ren, and Z. Zhou (2019). Batched Multi-armed Bandits Problem. arXiv preprint arXiv:1904.01763. https://arxiv.org/abs/1904.01763. [ bib \| http ]
[GB12]	A. Garhwal and P. P. Bhattacharya (2012). A Survey on Dynamic Spectrum Access Techniques for Cognitive Radio. International Journal of Next-Generation Networks (IJNGN), 3(4). https://arxiv.org/abs/1201.1964. [ bib \| DOI \| http ]
[HR90]	T. Hagerup and C. Rüb (1990). A Guided Tour of Chernoff Bounds. Information processing letters, 33(6):305--308. [ bib ]
[Hoe63]	W. Hoeffding (1963). Probability Inequalities for Sums of Bounded Random Variables. Journal of the American statistical association, 58(301):13--30. [ bib ]
[PG07]	F. Pérez and B. E. Granger (May 2007). IPython: a System for Interactive Scientific Computing. Computing in Science and Engineering, 9(3):21--29. https://ipython.org. [ bib \| http ]
[KG19]	A. Kolnogorov and S. Garbar (2019). Multi-Armed Bandit Problem and Batch UCB Rule. arXiv preprint arXiv:1902.00214. https://arxiv.org/abs/1902.00214. [ bib \| http ]
[KDY⁺16]	R. Kumar, S. J. Darak, A. Yadav, A. K. Sharma, and R. K. Tripathi (2016). Two-stage Decision Making Policy for Opportunistic Spectrum Access and Validation on USRP Testbed. Wireless Networks, pages 1--15. [ bib ]
[KDY⁺17]	R. Kumar, S. J. Darak, A. Yadav, A. K. Sharma, and R. K. Tripathi (2017). Channel Selection for Secondary Users in Decentralized Network of Unknown Size. Communications Letters, 21(10):2186--2189. [ bib ]
[LLL19]	H. Li, J. Luo, and C. Liu (2019). Selfish Bandit based Cognitive Anti-jamming Strategy for Aeronautic Swarm Network in Presence of Multiple Jammert. IEEE Access. [ bib ]
[MM12]	J. Marinho and E. Monteiro (2012). Cognitive Radio: Survey on Communication Protocols Spectrum Decision Issues and Future Research Directions. Wireless Networks, 18(2):147--164. [ bib ]
[Hun07]	J. D. Hunter (2007). Matplotlib: a 2D Graphics Environment. Computing In Science & Engineering, 9(3):90--95. [ bib \| DOI ]
[vdWCV11]	S. van der Walt, C. S. Colbert, and G. Varoquaux (March 2011). The NumPy Array: A Structure for Efficient Numerical Computation. Computing in Science & Engineering, 13(2):22--30. [ bib \| DOI ]
[VPSE16]	P. Vianney, R. Philippe, C. Sylvain, and S. Erik (04 2016). Batched Bandit Problems. The Annals of Statistics, 44(2):660--681. https://doi.org/10.1214/15-AOS1381. [ bib \| DOI \| http ]
[SV95]	D. Siegmund and E.S. Venkatraman (1995). Using the Generalized Likelihood Ratio Statistic for Sequential Detection of a Change Point. The Annals of Statistics, pages 255--271. [ bib ]
[SB11]	M. Subhedar and G. Birajdar (2011). Spectrum Sensing Techniques in Cognitive Radio Networks: a Survey. International Journal of Next-Generation Networks, 3(2):37--51. [ bib ]
[TZZ19]	C. Tao, Q. Zhang, and Y. Zhou (2019). Collaborative Learning with Limited Interaction: Tight Bounds for Distributed Exploration in Multi-Armed Bandits. arXiv preprint arXiv:1904.03293. https://arxiv.org/abs/1904.03293. [ bib \| http ]
[WHCW19]	Y. Wang, J. Hu, X. Chen, and L. Wang (2019). Distributed Bandit Learning: How Much Communication is Needed to Achieve (Near) Optimal Regret. arXiv preprint arXiv:1904.06309. https://arxiv.org/abs/1904.06309. [ bib \| http ]
[YA09]	T. Yucek and H. Arslan (2009). A Survey of Spectrum Sensing Algorithms for Cognitive Radio Applications. IEEE Communications Surveys & Tutorials, 11(1):116--130. [ bib ]
[YRJW17]	F. Yang, A. Ramdas, K. Jamieson, and M. Wainwright (2017). A framework for Multi-A(rmed)/B(andit) Testing with Online FDR Control. In Advances in Neural Information Processing Systems, pages 5957--5966. Curran Associates, Inc. [ bib ]
[CL11]	O. Chapelle and L. Li (2011). An Empirical Evaluation of Thompson Sampling. In Advances in Neural Information Processing Systems, pages 2249--2257. Curran Associates, Inc. [ bib ]
[Abr70]	N. Abramson (1970). The ALOHA System: Another Alternative for Computer Communications. In Proceedings of the November 17-19, 1970, Fall Joint Computer Conference, AFIPS '70 (Fall), pages 281--285. ACM, New York, NY, USA. [ bib \| DOI ]
[ACE09]	D. Agarwal, B. Chen, and P. Elango (2009). Explore Exploit Schemes For Web Content Optimization. In International Conference on Data Mining. IEEE. [ bib ]
[ALNS17]	A. Agarwal, H. Luo, B. Neyshabur, and R. E. Schapire (2017). Corralling a Band of Bandit Algorithms. In Conference on Learning Theory, pages 12--38. PMLR. [ bib ]
[AG12]	S. Agrawal and N. Goyal (2012). Analysis of Thompson sampling for the Multi-Armed Bandit problem. In Conference On Learning Theory, pages 36--65. PMLR. [ bib ]
[AMF17]	R. Alami, O.-A. Maillard, and R. Féraud (2017). Memory Bandits: Towards the Switching Bandit Problem Best Resolution. In Conference on Neural Information Processing Systems. [ bib ]
[AF15]	R. Allesiardo and R. Féraud (2015). Exp3 with Drift Detection for the Switching Bandit Problem. In International Conference on Data Science and Advanced Analytics, pages 1--7. IEEE. [ bib ]
[AMT10]	A. Anandkumar, N. Michael, and A. K. Tang (2010). Opportunistic Spectrum Access with multiple users: Learning under competition. In International Conference on Computer Communications. IEEE. [ bib ]
[AMS07]	J.-Y. Audibert, R. Munos, and C. Szepesvári (2007). Tuning Bandit Algorithms in Stochastic Environments. In Algorithmic Learning Theory, pages 150--165. Springer, Sendai, Japan. [ bib ]
[AB09]	J-Y. Audibert and S. Bubeck (2009). Minimax Policies for Adversarial and Stochastic Bandits. In Conference on Learning Theory, pages 217--226. PMLR. [ bib ]
[AC16]	P. Auer and C.-K. Chiang (2016). An Algorithm with Nearly Optimal Pseudo Regret for Both Stochastic and Adversarial Bandits. In Conference on Learning Theory, pages 116--120. PMLR. [ bib ]
[AM16]	O. Avner and S. Mannor (2016). Multi-User Lax Communications: a Multi-Armed Bandit Approach. In International Conference on Computer Communications. IEEE. [ bib ]
[AC18]	A. Azari and C. Cavdar (December 2018). Self-organized Low-power IoT Networks: A Distributed Learning Approach. In Global Communications Conference. IEEE, Abu Dhabi, UAE. [ bib ]
[BGZ14]	O. Besbes, Y. Gur, and A. Zeevi (2014). Stochastic Multi-Armed Bandit Problem with Non-Stationary Rewards. In Advances in Neural Information Processing Systems, pages 199--207. [ bib ]
[BK18a]	L. Besson and E. Kaufmann (2018). Multi-Player Bandits Revisited. In M. Mohri and K. Sridharan, editor, Algorithmic Learning Theory. Lanzarote, Spain. https://hal.archives-ouvertes.fr/hal-01629733. [ bib \| http ]
[BKM18]	L. Besson, E. Kaufmann, and C. Moy (2018). Aggregation of Multi-Armed Bandits Learning Algorithms for Opportunistic Spectrum Access. In Wireless Communications and Networking Conference. IEEE, Barcelona, Spain. https://hal.archives-ouvertes.fr/hal-01705292. [ bib \| http ]
[BBM19]	L. Besson, R. Bonnefoi, and C. Moy (April 2019). GNU Radio Implementation of MALIN: “Multi-Armed bandits Learning for Internet-of-things Networks”. In Wireless Communications and Networking Conference. IEEE, Marrakech, Morocco. https://hal.archives-ouvertes.fr/hal-02006825, following a Demonstration presented at International Conference on Telecommunications (ICT) 2018. [ bib \| http ]
[BL18]	I. Bistritz and A. Leshem (2018). Distributed Multi-Player Bandits: a Game Of Thrones Approach. In Advances in Neural Information Processing Systems, pages 7222--7232. [ bib ]
[BMP16]	R. Bonnefoi, C. Moy, and J. Palicot (2016). Advanced metering infrastructure backhaul reliability improvement with Cognitive Radio. In International Conference on Communications, Control, and Computing Technologies for Smart Grids, pages 230--236. [ bib \| DOI ]
[BBM⁺17]	R. Bonnefoi, L. Besson, C. Moy, E. Kaufmann, and J. Palicot (2017). Multi-Armed Bandit Learning in IoT Networks: Learning Helps Even in Non-Stationary Settings. In 12th EAI Conference on Cognitive Radio Oriented Wireless Network and Communication. Lisboa, Portugal. [ bib ]
[BBMVM19]	R. Bonnefoi, L. Besson, J. C. Manco-Vasquez, and C. Moy (April 2019). Upper-Confidence Bound for Channel Selection in LPWA Networks with Retransmissions. In MOTIoN Workshop. IEEE, Marrakech, Morocco. https://hal.archives-ouvertes.fr/hal-02049824. [ bib \| http ]
[BS12]	S. Bubeck and A. Slivkins (2012). The Best Of Both Worlds Stochastic And Adversarial Bandits. In Conference on Learning Theory, pages 42--1. PMLR. [ bib ]
[CN18]	C. Cano and G. Neu (2018). Wireless Optimisation via Convex Bandits: Unlicensed LTE/WiFi Coexistence. In Proceedings of the 2018 Workshop on Network Meets AI & ML, NetAI'18, pages 41--47. ACM, New York, NY, USA. ISBN 978-1-4503-5911-5. [ bib \| DOI ]
[CZKX19]	Y. Cao, W. Zheng, B. Kveton, and Y. Xie (2019). Nearly Optimal Adaptive Procedure for Piecewise-Stationary Bandit: a Change-Point Detection Approach. In International Conference on Artificial Intelligence and Statistics. Okinawa, Japan. [ bib ]
[CLLW19]	Y. Chen, C. Lee, H. Luo, and C. Wei (2019). A New Algorithm for Non-stationary Contextual Bandits: Efficient, Optimal, and Parameter-free. In A. Beygelzimer and D. Hsu, editor, Conference on Learning Theory, volume 99, pages 1--30. PMLR. https://arxiv.org/abs/1902.00980. [ bib \| http ]
[CMP17]	R. Combes, S. Magureanu, and A. Proutiere (2017). Minimal Exploration in Structured Stochastic Bandits. In Advances in Neural Information Processing Systems, pages 1761--1769. [ bib ]
[CGH⁺96]	R. Corless, G. Gonnet, D. Hare, D. Jeffrey, and D. Knuth (1996). On the Lambert W Function. In Advances in Computational Mathematics, pages 329--359. [ bib ]
[DP16]	R. Degenne and V. Perchet (2016). Anytime Optimal Algorithms In Stochastic Multi Armed Bandits. In International Conference on Machine Learning, pages 1587--1595. [ bib ]
[GC11]	A. Garivier and O. Cappé (2011). The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond. In Conference on Learning Theory, pages 359--376. PMLR. [ bib ]
[GM11]	A. Garivier and E. Moulines (2011). On Upper-Confidence Bound Policies For Switching Bandit Problems. In Algorithmic Learning Theory, pages 174--188. PMLR. [ bib ]
[GK16]	A. Garivier and E. Kaufmann (2016). Optimal Best Arm Identification with Fixed Confidence. In PMLR, volume 49 of Conference on Learning Theory. [ bib ]
[GKL16]	A. Garivier, E. Kaufmann, and T. Lattimore (2016). On Explore-Then-Commit Strategies. In PMLR, volume 29 of Advances in Neural Information Processing Systems. [ bib ]
[GGCA11]	N. Gupta, O. Granmo-Christoffer, and A. Agrawala (2011). Thompson Sampling for Dynamic Multi Armed Bandits. In International Conference on Machine Learning and Applications Workshops, pages 484--489. IEEE. [ bib ]
[HGB⁺06]	C. Hartland, S. Gelly, N. Baskiotis, O. Teytaud, and M. Sebag (2006). Multi-Armed Bandit, Dynamic Environments and Meta-Bandits. In NeurIPS 2006 Workshop, Online Trading Between Exploration And Exploitation. [ bib ]
[HT10]	J. Honda and A. Takemura (2010). An Asymptotically Optimal Bandit Algorithm for Bounded Support Models. In Conference on Learning Theory, pages 67--79. PMLR. [ bib ]
[JKYD18]	H. Joshi, R. Kumar, A. Yadav, and S. J. Darak (2018). Distributed Algorithm for Dynamic Spectrum Access in Infrastructure-Less Cognitive Radio Network. In 2018 IEEE Wireless Communications and Networking Conference (WCNC), pages 1--6. ISSN 1558-2612. [ bib \| DOI ]
[JEMP09]	W. Jouini, D. Ernst, C. Moy, and J. Palicot (2009). Multi-Armed Bandit Based Policies for Cognitive Radio's Decision Making Issues. In International Conference Signals, Circuits and Systems. IEEE. [ bib ]
[JEMP10]	W. Jouini, D. Ernst, C. Moy, and J. Palicot (2010). Upper Confidence Bound Based Decision Making Strategies and Dynamic Spectrum Access. In International Conference on Communications, pages 1--5. IEEE. [ bib \| DOI ]
[KNJ12]	D. Kalathil, N. Nayyar, and R. Jain (2012). Decentralized Learning for Multi-Player Multi-Armed Bandits. In Conference on Decision and Control. IEEE. [ bib ]
[KCG12]	E. Kaufmann, O. Cappé, and A. Garivier (2012). On Bayesian Upper Confidence Bounds for Bandit Problems. In International Conference on Artificial Intelligence and Statistics, pages 592--600. [ bib ]
[KKM12]	E. Kaufmann, N. Korda, and R. Munos (2012). Thompson Sampling: an Asymptotically Optimal Finite-Time Analysis. In Algorithmic Learning Theory, pages 199--213. PMLR. [ bib ]
[KCG14]	E. Kaufmann, O. Cappé, and A. Garivier (2014). On the Complexity of A/B Testing. In Conference on Learning Theory, pages 461--481. PMLR. [ bib ]
[KAF⁺18]	R. Kerkouche, R. Alami, R. Féraud, N. Varsier, and P. Maillé (2018). Node-based optimization of LoRa transmissions with Multi-Armed Bandit algorithms. In International Conference on Telecommunications. J. Palicot and R. Pyndiah, Saint-Malo, France. [ bib ]
[KS06]	L. Kocsis and C. Szepesvári (2006). Discounted UCB. In 2nd PASCAL Challenges Workshop. [ bib ]
[KHN15]	J. Komiyama, J. Honda, and H. Nakagawa (2015). Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-Armed Bandit Problem with Multiple Plays. In International Conference on Machine Learning, volume 37, pages 1152--1161. PMLR. [ bib ]
[TRY17]	K. Tomer, L. Roi, and M. Yishay (2017). Bandits with Movement Costs and Adaptive Pricing. In Conference on Learning Theory, volume 65, pages 1242--1268. PMLR. [ bib ]
[KYDH18]	R. Kumar, A. Yadav, S. J. Darak, and M. K. Hanawal (2018). Trekking Based Distributed Algorithm for Opportunistic Spectrum Access in Infrastructure-Less Network. In 2018 16th International Symposium on Modeling and Optimization in Mobile, Ad-Hoc, and Wireless Networks (WiOpt), pages 1--8. [ bib \| DOI ]
[KSGB19]	B. Kveton, C. Szepesvari, M. Ghavamzadeh, and C. Boutilier (2019). Perturbed-History Exploration in Stochastic Multi-Armed Bandits. In 28th International Joint Conference on Artificial Intelligence (IJCAI 2019). https://arxiv.org/abs/1902.10089. [ bib \| http ]
[KPV17]	J. Kwon, V. Perchet, and C. Vernade (2017). Sparse Stochastic Bandits. In Conference on Learning Theory, pages 1269--1270. [ bib ]
[LVC16]	P. Lagrée, C. Vernade, and O. Cappé (2016). Multiple-Play Bandits in the Position-Based Model. In Advances in Neural Information Processing Systems, pages 1597--1605. [ bib ]
[Lat16c]	T. Lattimore (2016). Regret Analysis Of The Finite Horizon Gittins Index Strategy For Multi Armed Bandits. In Conference on Learning Theory, pages 1214--1245. PMLR. [ bib ]
[LM09]	A. Lazaric and R. Munos (2009). Hybrid Stochastic-Adversarial On-Line Learning. In Conference on Learning Theory. [ bib ]
[LCLS10]	L. Li, W. Chu, J. Langford, and R. E. Schapire (2010). A Contextual-Bandit Approach to Personalized News Article Recommendation. In International Conference on World Wide Web, pages 661--670. ACM. [ bib ]
[LPSY18]	D. Liau, E. Price, Z. Song, and G. Yang (2018). Stochastic Multi-Armed Bandits in Constant Space. In International Conference on Artificial Intelligence and Statistics. [ bib ]
[LZ08]	K. Liu and Q. Zhao (2008). A Restless Bandit Formulation of Opportunistic Access: Indexablity and Index Policy. In Annual Communications Society Conference on Sensor, Mesh and Ad-Hoc Communications and Networks Workshops. IEEE. [ bib ]
[LLS18]	F. Liu, J. Lee, and N. Shroff (2018). A Change-Detection based Framework for Piecewise-stationary Multi-Armed Bandit Problem. In The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI 2018). [ bib ]
[LBCU⁺09]	M. López-Benítez, F. Casadevall, A. Umbert, J. Pérez-Romero, R. Hachemani, J. Palicot, and C. Moy (2009). Spectral Occupation Measurements and Blind Standard Recognition Sensor for Cognitive Radio Networks. In 2009 4th International Conference on Cognitive Radio Oriented Wireless Networks and Communications, pages 1--9. IEEE. [ bib ]
[LRC⁺16]	J. Louëdec, L. Rossi, M. Chevalier, A. Garivier, and J. Mothe (2016). Algorithme de bandit et obsolescence : un modèle pour la recommandation. In 18ème Conférence francophone sur l'Apprentissage Automatique, 2016 (Marseille, France). [ bib ]
[LWAL18]	H. Luo, C. Wei, A. Agarwal, and J. Langford (2018). Efficient Contextual Bandits in Non-stationary Worlds. In S. Bubeck, V. Perchet, and P. Rigollet, editors, Proceedings of the 31st Conference On Learning Theory, volume 75 of Proceedings of Machine Learning Research, pages 1739--1776. PMLR. https://http://proceedings.mlr.press/v75/luo18a.html. [ bib \| .html \| .pdf ]
[MM11]	O.-A. Maillard and R. Munos (2011). Adaptive Bandits: Towards the best history-dependent strategy. In International Conference on Artificial Intelligence and Statistics, pages 570--578. [ bib ]
[Mai19]	O.-A. Maillard (2019). Sequential change-point detection: Laplace concentration of scan statistics and non-asymptotic delay bounds. In Algorithmic Learning Theory. [ bib ]
[MS13]	J. Mellor and J. Shapiro (2013). Thompson Sampling in Switching Environments with Bayesian Online Change Detection. In Artificial Intelligence and Statistics, pages 442--450. [ bib ]
[MG17]	P. Ménard and A. Garivier (2017). A Minimax and Asymptotically Optimal Algorithm for Stochastic Bandits. In Algorithmic Learning Theory, volume 76, pages 223--237. PMLR. [ bib ]
[MTC⁺16]	A. Maskooki, V. Toldov, L. Clavier, V. Loscrí, and N. Mitton (February 2016). Competition: Channel Exploration/Exploitation Based on a Thompson Sampling Approach in a Radio Cognitive Environment. In International Conference on Embedded Wireless Systems and Networks (dependability competition). Graz, Austria. [ bib ]
[MM17]	J. Mourtada and O.-A. Maillard (2017). Efficient Tracking of a Growing Number of Experts. In Algorithmic Learning Theory, volume 76 of Proceedings of Algorithmic Learning Theory, pages 1--23. Tokyo, Japan. [ bib ]
[MB19]	C. Moy and L. Besson (May 2019). Decentralized Spectrum Learning for IoT Wireless Networks Collision Mitigation. In ISIoT workshop. Santorin, Greece. https://sites.google.com/view/ISIoT2019/. [ bib \| http \| .pdf ]
[Moy14]	C. Moy (2014). Reinforcement Learning Real Experiments for Opportunistic Spectrum Access. In WSR'14, page 10. Karlsruhe, Germany. https://hal-supelec.archives-ouvertes.fr/hal-00994975. [ bib \| http ]
[NC17]	O. Naparstek and K. Cohen (2017). Deep Multi-User Reinforcement Learning for Dynamic Spectrum Access in Multichannel Wireless Networks. In GLOBECOM 2017 - 2017 IEEE Global Communications Conference, pages 1--7. [ bib \| DOI ]
[RMZ14]	C. Robert, C. Moy, and H. Zhang (2014). Opportunistic Spectrum Access Learning Proof of Concept. In SDR-WinnComm'14, page 8. Schaumburg, United States. https://hal-supelec.archives-ouvertes.fr/hal-00994940. [ bib \| http ]
[RSS16]	J. Rosenski, O. Shamir, and L. Szlak (2016). Multi-Player Bandits -- A Musical Chairs Approach. In International Conference on Machine Learning, pages 155--163. PMLR. [ bib ]
[SLM12]	A. Sani, A. Lazaric, and R. Munos (2012). Risk-Aversion In Multi-Armed Bandits. In Advances in Neural Information Processing Systems, pages 3275--3283. [ bib ]
[SKHD18]	S. Sawant, R. Kumar, M. K. Hanawal, and S. J. Darak (2018). Learning to Coordinate in a Decentralized Cognitive Radio Network in Presence of Jammers. In 16th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt). IEEE, Shanghai, China. https://arxiv.org/abs/1803.06810. [ bib \| DOI \| http ]
[SL17]	Y. Seldin and G. Lugosi (2017). An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits. In Conference on Learning Theory, volume 65, pages 1--17. PMLR. [ bib ]
[TL11]	C. Tekin and M. Liu (2011). Performance and Convergence of Multi-User Online Learning. In International Conference on Game Theory for Networks, pages 321--336. Springer Berlin Heidelberg. [ bib ]
[TL12]	C. Tekin and M. Liu (2012). Online Learning in Decentralized Multi-User Spectrum Access with Synchronized Explorations. In Military Communications Conference. IEEE. [ bib ]
[TPHD19]	H. Tibrewal, S. Patchala, M. K. Hanawal, and S. J. Darak (2019). Distributed Learning and Optimal Assignment in Multiplayer Heterogeneous Networks. In IEEE Conference on Computer Communications (INFOCOM 2019), pages 1693--1701. IEEE. https://arxiv.org/abs/1901.03868. [ bib \| http ]
[TCLM16]	V. Toldov, L. Clavier, V. Loscrí, and N. Mitton (September 2016). A Thompson Sampling Approach to Channel Exploration Exploitation Problem in Multihop Cognitive Radio Networks. In PIMRC, pages 1--6. Valencia, Spain. [ bib \| DOI ]
[WS18b]	L. Wei and V. Srivatsva (2018). On Abruptly-Changing And Slowly-Varying Multi-Armed Bandit Problems. In American Control Conference, pages 6291--6296. IEEE. [ bib ]
[WS18a]	L. Wei and V. Srivastava (2018). On Distributed Multi-player Multi-Armed Bandit Problems in Abruptly-Changing Environment. In Conference on Decision and Control, pages 5783--5788. IEEE. [ bib ]
[YFE12]	X. Yang, A. Fapojuwo, and E. Egbogah (September 2012). Performance Analysis and Parameter Optimization of Random Access Backoff Algorithm in LTE. In Vehicular Technology Conference, pages 1--5. IEEE. [ bib \| DOI ]
[YM09]	J. Y. Yu and S. Mannor (2009). Piecewise-Stationary Bandit Problems with Side Observations. In International Conference on Machine Learning, pages 1177--1184. ACM. [ bib ]
[ZBLN19]	S. M. Zafaruddin, I. Bistritz, A. Leshem, and D. Niyato (2019). Distributed Learning for Channel Allocation Over a Shared Spectrum. In 20th IEEE International Workshop on SignalProcessing Advances in Wireless Communications (SPAWC). Cannes, France. https://arxiv.org/abs/1902.06353. [ bib \| http ]
[ZS19]	J. Zimmert and Y. Seldin (2019). An Optimal Algorithm for Stochastic and Adversarial Bandits. In K. Chaudhuri and M. Sugiyama, editors, Proceedings of Machine Learning Research, volume 89 of Proceedings of Machine Learning Research, pages 467--475. PMLR. http://proceedings.mlr.press/v89/zimmert19a.html. [ bib \| .html \| .pdf ]
[ABM10]	J-Y. Audibert, S. Bubeck, and R. Munos (2010). Best Arm Identification in Multi-Armed Bandits. In Conference on Learning Theory, page 13. PMLR. [ bib ]
[ACBFS95]	P. Auer, N. Cesa-Bianchi, Y. Freund, and R. Schapire (1995). Gambling in a Rigged Casino: The Adversarial Multi-Armed Bandit Problem. In Annual Symposium on Foundations of Computer Science, pages 322--331. IEEE. [ bib ]
[BV19]	M. Bande and V. V. Veeravalli (2019). Adversarial Multi-User Bandits for Uncoordinated Spectrum Access. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4514--4518. IEEE. [ bib ]
[DNMP16]	S. J. Darak, A. Nafkha, C. Moy, and J. Palicot (2016). Is Bayesian Multi Armed Bandit Algorithm Superior? Proof of Concept for Opportunistic Spectrum Access in Decentralized Networks. In 11th EAI Conference on Cognitive Radio Oriented Wireless Network and Communication. Grenoble, France. [ bib ]
[K⁺16]	T. Kluyver et al. (2016). Jupyter Notebooks -- a publishing format for reproducible computational workflows. In F. Loizides and B. Schmidt, editors, Positioning and Power in Academic Publishing: Players, Agents and Agendas, pages 87--90. IOS Press. [ bib ]
[DMNM16]	S. J. Darak, N. Modi, A. Nafkha, and C. Moy (2016). Spectrum Utilization and Reconfiguration Cost Comparison of Various Decision Making Policies for Opportunistic Spectrum Access Using Real Radio Signals. In 11th EAI Conference on Cognitive Radio Oriented Wireless Network and Communication. Grenoble, France. [ bib ]
[PPS11]	K. Patil, R. Prasad, and K. Skouby (2011). A Survey of Worldwide Spectrum Occupancy Measurement Campaigns for Cognitive Radio. In 2011 International Conference on Devices and Communications (ICDeCom), pages 1--5. IEEE. [ bib ]
[VMB⁺10]	V. Valenta, R. Maršálek, G. Baudoin, M. Villegas, M. Suarez, and F. Robert (2010). Survey on spectrum utilization in Europe: Measurements, analyses and observations. In 5th EAI Conference on Cognitive Radio Oriented Wireless Network and Communication, pages 1--5. IEEE. [ bib ]
[Chi18]	F. Chiusano (2018). Breakpoint Prediction for the Abruptly-Switching Non-Stationary Multi-Armed Bandit Problem. Master's thesis, Politecnico Di Milano, AI & R Lab, Laboratorio di Intelligenza Artificiale e Robotica del Politecnico di Milano. [ bib ]
[Wei17]	E. W. Weisstein (2017). Exponential Integral. Online at `https://mathworld.wolfram.com/ExponentialIntegral.html`. From MathWorld -- A Wolfram Web Resource. [ bib ]
[Col17]	Collective (2017). Exponential Integral. Online at `https://en.wikipedia.org/wiki/Exponential_integral`. From Wikipedia, The Free Encyclopedia. [ bib ]
[GNUa]	GNU Radio Companion Documentation and Website. Online at `https://wiki.gnuradio.org/index.php/GNURadioCompanion`. Accessed: 2018-09-25. [ bib ]
[GNUb]	GNU Radio Documentation and Website. Online at `https://www.gnuradio.org/about/`. Accessed: 2018-09-25. [ bib ]
[Jor10]	M. Jordan (2010). Stat 260/CS 294. Online at `https://people.eecs.berkeley.edu/~jordan/courses/260-spring10/`. Chapter 8 covers the exponential family. [ bib ]
[Oct]	OctoClock Clock Distribution Module with GPSDO - Ettus Research. Online at `https://www.ettus.com/product/details/OctoClock-G`. Accessed: 2018-09-25. [ bib ]
[Bes19]	L. Besson (2016--2019). SMPyBandits: an Open-Source Research Framework for Single and Multi-Players Multi-Arms Bandits (MAB) Algorithms in Python. Code at `https://GitHub.com/SMPyBandits/SMPyBandits/`, documentation at `https://SMPyBandits.GitHub.io/`. [ bib ]
[Tol14a]	J. Toledano (June 2014). Executive Summary: Dynamic Spectrum Management For Innovation And Growth. Online at `https://www.economie.gouv.fr/files/files/PDF/french-spectrum-mission-executive-summary-2014-06-25.pdf`. https://www.ladocumentationfrancaise.fr/rapports-publics/144000381/index.shtml. [ bib \| http \| .pdf ]
[Tol14b]	J. Toledano (June 2014). Une gestion dynamique du spectre pour l'innovation et la croissance. Online at `https://www.economie.gouv.fr/files/files/PDF/rapport-gestion-dynamique-spectre-2014-06-30.pdf`. https://www.ladocumentationfrancaise.fr/rapports-publics/144000381/index.shtml. [ bib \| http \| .pdf ]
[Lat16a]	T. Lattimore (2016). Library for Multi-Armed Bandit Algorithms. Online at: `https://github.com/tor/libbandit`. https://github.com/tor/libbandit. [ bib \| http ]
[Ett]	Ettus. USRP Hardware Driver and USRP Manual. Online at `https://files.ettus.com/manual/page_usrp2.html`. Accessed: 2018-09-25. [ bib ]
[Raj17]	V. Raj (2017). A Julia Package for providing Multi Armed Bandit Experiments. Online at: `https://github.com/v-i-s-h/MAB.jl`. https://github.com/v-i-s-h/MAB.jl. [ bib \| http ]
[BBS⁺19]	S. Behnel, R. Bradshaw, Dag S. Seljebotn, G. Ewing, W. Stein, G. Gellner, et al. (2019). Cython: C-Extensions for Python. Online at: `https://cython.org`. https://cython.org. [ bib \| http ]
[C⁺18]	A. Collette et al. (2018). h5py: HDF5 for Python. Online at: `https://www.h5py.org`. https://www.h5py.org. [ bib \| http ]
[Var17]	G. Varoquaux (March 2017). Joblib: running Python functions as pipeline jobs. Online at: `https://joblib.readthedocs.io`. https://joblib.readthedocs.io. [ bib \| http ]
[I⁺17]	Anaconda Inc. et al. (2017). Numba, NumPy aware dynamic Python compiler using LLVM. Online at: `https://numba.pydata.org`. https://numba.pydata.org. [ bib \| http ]
[CGK12]	O. Cappé, A. Garivier, and E. Kaufmann (2012). pymaBandits. https://mloss.org/software/view/415, online at: `https://mloss.org/software/view/415`. [ bib \| http ]
[Fou17]	Python Software Foundation (October 2017). Python Language Reference, version 3.6. Online at: `https://www.python.org`. https://www.python.org. [ bib \| http ]
[JOP⁺01]	E. Jones, T. E. Oliphant, P. Peterson, et al. (2001). SciPy: Open source scientific tools for Python. Online at: `https://www.scipy.org`. https://www.scipy.org. [ bib \| http ]
[W⁺17]	M. Waskom et al. (September 2017). Seaborn: statistical data visualization. Online at: `https://seaborn.pydata.org`. https://seaborn.pydata.org. [ bib \| DOI \| http ]
[B⁺18]	G. Brandl et al. (2018). Sphinx: Python documentation generator. Online at: `https://sphinx-doc.org`. https://sphinx-doc.org. [ bib \| http ]
[BP⁺16]	I. Bicking, PyPA, et al. (November 2016). Virtualenv: a tool to create isolated Python environments. Online at: `https://virtualenv.pypa.io`. https://virtualenv.pypa.io. [ bib \| http ]
[Bod17]	Q. Bodinier (2017). Coexistence of Communication Systems Based on Enhanced Multi-Carrier Waveforms with Legacy OFDM Networks. Ph.D. thesis, CentraleSupélec. https://www.theses.fr/2017REN1S091. [ bib \| http ]
[Jou17]	W. Jouini (2017). Contribution to Learning and Decision Making under Uncertainty for Cognitive Radio. Ph.D. thesis, CentraleSupélec, IETR, Rennes. https://www.theses.fr/2012SUPL0010. [ bib \| http ]
[Kau14]	E. Kaufmann (2014). Analysis of Bayesian and Frequentist Strategies for Sequential Resource Allocation. Ph.D. thesis, Telecom ParisTech. https://www.theses.fr/2014ENST0056. [ bib \| http ]
[Mod17]	N. Modi (2017). Machine Learning and Statistical Decision Making for Green Radio. Ph.D. thesis, CentraleSupélec, IETR, Rennes. https://www.theses.fr/2017SUPL0002. [ bib \| http ]
[Tol17]	V. Toldov (2017). Adaptive MAC Layer for Interference Limited WSN. Ph.D. thesis, Université Lille 1 Sciences et technologies. https://www.theses.fr/2017LIL10002. [ bib \| http ]
[Val16]	M. Valko (2016). Bandits on Graphs and Structures. Habilitation thesis to supervise research, École normale supérieure de Cachan. [ bib ] We investigate the structural properties of certain sequential decision-making problems with limited feedback (bandits) in order to bring the known algorithmic solutions closer to a practical use. In the first part, we put a special emphasis on structures that can be represented as graphs on actions, in the second part we study the large action spaces that can be of exponential size in the number of base actions or even infinite. We show how to take advantage of structures over the actions and (provably) learn faster.
[BK18b]	L. Besson and E. Kaufmann (February 2018). What Doubling Trick Can and Can't Do for Multi-Armed Bandits. https://hal.archives-ouvertes.fr/hal-01736357, preprint, `https://hal.archives-ouvertes.fr/hal-01736357`. [ bib \| http \| www: ]
[BBM18]	L. Besson, R. Bonnefoi, and C. Moy (June 2018). Multi-Arm Bandit Algorithms for Internet of Things Networks: A TestBed Implementation and Demonstration that Learning Helps. https://ict-2018.org/demos,youtu.be/HospLNQhcMk, Demonstration presented at International Conference on Telecommunications. [ bib \| http ]
[BK19b]	L. Besson and E. Kaufmann (February 2019). Combining the Generalized Likelihood Ratio Test and kl-UCB for Non-Stationary Bandits. https://hal.archives-ouvertes.fr/hal-02006471, preprint, `https://hal.archives-ouvertes.fr/hal-02006471`, arXiv preprint arXiv:1902.01575. [ bib \| http ]
[MBDT19]	C. Moy, L. Besson, G. Delbarre, and L. Toutain (July 2019). Decentralized Spectrum Learning for Radio Collision Mitigation in Ultra-Dense IoT Networks: LoRaWAN Case Study and Measurements. https://hal.inria.fr/hal-XXX, Submitted for a special volume on Machine Learning for Intelligent Wireless Communications and Networking, Annals of Telecommunications. [ bib \| http \| .pdf ]
[MM19]	S. Mukherje and O.-A. Maillard (2019). Improved Changepoint Detection for Piecewise i.i.d Bandits. https://subhojyoti.github.io/pdf/aistats_2019.pdf, working paper or preprint. [ bib \| .pdf ]
[RK17]	V. Raj and S. Kalyani (2017). Taming Non-Stationary Bandits: a Bayesian Approach. https://arxiv.org/abs/1707.09727. [ bib \| http ]
[Bes18]	L. Besson (2018). SMPyBandits: an Experimental Framework for Single and Multi-Players Multi-Arms Bandits Algorithms in Python. https://hal.archives-ouvertes.fr/hal-01840022, preprint, submitted to JMLR MLOSS, `https://hal.archives-ouvertes.fr/hal-01840022`. [ bib \| http \| www: ]
[BN93]	M. Basseville and I. Nikiforov (1993). Detection of Abrupt Changes: Theory And Application, volume 104. Prentice Hall Englewood Cliffs. [ bib ]
[Bón11]	M. Bóna (2011). A Walk Through Combinatorics: an Introduction to Enumeration and Graph Theory. World Scientific. [ bib ]
[BLM13]	S. Boucheron, G. Lugosi, and P. Massart (2013). Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford university press. [ bib ]
[BV04]	S. Boyd and L. Vandenberghe (2004). Convex Optimization. Cambridge Univ. Press. [ bib ]
[CBL06]	N. Cesa-Bianchi and G. Lugosi (2006). Prediction, Learning, and Games. Cambridge University Press. [ bib ]
[LS19]	T. Lattimore and C. Szepesvári (2019). Bandit Algorithms. Cambridge University Press. https://tor-lattimore.com/downloads/book/book.pdf, draft of Wednesday 1st of May, 2019, `https://tor-lattimore.com/downloads/book/book.pdf`. [ bib \| .pdf ]
[Nor98]	J. R. Norris (1998). Markov Chains, volume 2 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge. [ bib ]
[Sie13]	D. Siegmund (2013). Sequential analysis: tests and confidence intervals. Springer Science & Business Media. [ bib ]
[SB18]	R. S. Sutton and A. G. Barto (2018). Reinforcement Learning: An introduction. MIT press. [ bib ]
[TNB15]	A. Tartakovsky, I. Nikiforov, and M. Basseville (2015). Sequential Analysis. Hypothesis Testing and Changepoint Detection. CRC Press. [ bib ]

This file was generated by bibtex2html 1.99.