| [Agr95] | R. Agrawal (1995).
 Sample mean based index policies by O(logn)
  regret for the Multi-Armed Bandit problem.
 Advances in Applied Probability, 27(4):1054--1078. [ bib ] | 
| [ALK19] | P. Alatur, K. Y. Levy, and A. Krause (2019).
 Multi-Player Bandits: The Adversarial Case.
 arXiv preprint
  arXiv:1902.08036.
 https://arxiv.org/abs/1902.08036. [ bib | http ] | 
| [AFM17] | R. Allesiardo, R. Féraud, and O.-A. Maillard (2017).
 The Non-Stationary Stochastic Multi-Armed Bandit Problem.
 International Journal of Data Science and Analytics,
  3(4):267--283. [ bib ] | 
| [AMTA11] | A. Anandkumar, N. Michael, A. K. Tang, and S. Agrawal (2011).
 Distributed Algorithms for Learning and Cognitive Medium
  Access with Logarithmic Regret.
 Journal on Selected Areas in Communications, 29(4):731--745. [ bib ] | 
| [AVW87a] | V. Anantharam, P. Varaiya, and J. Walrand (1987).
 Asymptotically efficient allocation rules for the Multi-Armed
  Bandit problem with multiple plays - Part I: IID rewards.
 Transactions on Automatic Control, 32(11):968--976. [ bib ] | 
| [AVW87b] | V. Anantharam, P. Varaiya, and J. Walrand (1987).
 Asymptotically efficient allocation rules for the Multi-Armed
  Bandit problem with multiple plays - Part II: Markovian rewards.
 Transactions on Automatic Control, 32(11):977--982. [ bib ] | 
| [AHK12] | S. Arora, E. Hazan, and S. Kale (2012).
 The Multiplicative Weights Update Method: a Meta-Algorithm and
  Applications.
 Theory of Computing, 8(1):121--164. [ bib ] | 
| [AE61] | K. J. Arrow and A. C. Enthoven (1961).
 Quasi-Concave Programming.
 Econometrica, 29(4):779--800. [ bib ] | 
| [ACBF02] | P. Auer, N. Cesa-Bianchi, and P. Fischer (2002).
 Finite-time Analysis of the Multi-armed Bandit Problem.
 Machine Learning, 47(2):235--256. [ bib | DOI ] | 
| [ACBFS02] | P. Auer, N. Cesa-Bianchi, Y. Freund, and R. Schapire (2002).
 The Non-Stochastic Multi-Armed Bandit Problem.
 SIAM journal on computing, 32(1):48--77. [ bib ] | 
| [AO10] | P. Auer and R. Ortner (2010).
 UCB Revisited: Improved Regret Bounds For The Stochastic
  Multi-Armed Bandit Problem.
 Periodica Mathematica Hungarica, 61(1-2):55--65. [ bib ] | 
| [AGO18] | P. Auer, P. Gajane, and R. Ortner (2018).
 Adaptively Tracking the Best Arm with an Unknown Number of
  Distribution Changes.
 European Workshop on Reinforcement Learning.
 
  https://ewrl.files.wordpress.com/2018/09/ewrl_14_2018_paper_28.pdf. [ bib | .pdf ] | 
| [AM15] | O. Avner and S. Mannor (2015).
 Learning to Coordinate Without Communication in Multi-User
  Multi-Armed Bandit Problems.
 arXiv preprint
  arXiv:1504.08167.
 https://arxiv.org/abs/1504.08167. [ bib | http ] | 
| [AM18] | O. Avner and S. Mannor (2018).
 Multi-User Communication Networks: A Coordinated Multi-Armed
  Bandit Approach.
 arXiv preprint
  arXiv:1808.04875.
 https://arxiv.org/abs/1808.04875. [ bib | http ] | 
| [BMM14] | A. Baransi, O.-A. Maillard, and S. Mannor (2014).
 Sub-sampling for Multi-armed Bandits.
 Proceedings of the European Conference on Machine Learning.
 https://hal.archives-ouvertes.fr/hal-01025651. [ bib | http ] | 
| [BK19a] | L. Besson and E. Kaufmann (August 2019).
 Analyse non asymptotique d'un test séquentiel de détection
  de ruptures et application aux bandits non stationnaires.
 GRETSI.
 https://hal.archives-ouvertes.fr/hal-02152243,
  https://hal.archives-ouvertes.fr/hal-02152243. [ bib | http | .pdf ] | 
| [BMP18] | R. Bonnefoi, C. Moy, and J. Palicot (2018).
 Improvement of the LPWAN AMI backhaul's latency thanks to
  reinforcement learning algorithms.
 EURASIP Journal on Wireless Communications and Networking,
  2018(1):34. [ bib | DOI ] | 
| [BP18] | E. Boursier and V. Perchet (2018).
 SIC-MMAB: Synchronisation Involves Communication in
  Multiplayer Multi-Armed Bandits.
 arXiv preprint
  arXiv:1809.08151.
 https://arxiv.org/abs/1809.08151. [ bib | http ] | 
| [BCB12] | S. Bubeck and N. Cesa-Bianchi (2012).
 Regret Analysis of Stochastic and Non-Stochastic Multi-Armed
  Bandit Problems.
 Foundations and Trends in Machine Learning,
  5(1):1--122. [ bib ] | 
| [BK96] | A. N. Burnetas and M. N. Katehakis (1996).
 Optimal Adaptive Policies for Sequential Allocation
  Problems.
 Advances in Applied Mathematics, 17(2):122--142. [ bib ] | 
| [CVZZ16] | M. Centenaro, L. Vangelista, A. Zanella, and M. Zorzi (2016).
 Long-range communications in unlicensed bands: the rising stars
  in the IoT and smart city scenarios.
 Wireless Communications, 23(5):60--67. [ bib | DOI ] | 
| [CMR14] | O. Chapelle, E. Manavoglu, and R. Rosales (2014).
 Simple and Scalable Response Prediction For Display
  Advertising.
 Transactions on Intelligent Systems and Technology. [ bib ] | 
| [DMP16] | S. J. Darak, C. Moy, and J. Palicot (2016).
 Proof-of-Concept System for Opportunistic Spectrum Access in
  Multi-user Decentralized Networks.
 EAI Endorsed Transactions on Cognitive Communications,
  2:1--10. [ bib ] | 
| [DH18] | S. J. Darak and M. K. Hanawal (2018).
 Distributed Learning and Stable Orthogonalization in Ad-Hoc
  Networks with Heterogeneous Channels.
 arXiv preprint
  arXiv:1812.11651.
 https://arxiv.org/abs/1812.11651. [ bib | http ] | 
| [GBV18] | G. Gautier, R. Bardenet, and M. Valko (2018).
 DPPy: Sampling Determinantal Point Processes with Python.
 arXiv preprint
  arXiv:1809.07258.
 https://arxiv.org/abs/1809.07258, code at
  https://github.com/guilgautier/DPPy. Documentation at
  https://dppy.readthedocs.io. [ bib | http ] Keywords: Computer Science - Machine Learning, Computer Science - Mathematical Software, Statistics - Machine Learning | 
| [GMS16] | A. Garivier, P. Ménard, and G. Stoltz (2016).
 Explore First, Exploit Next: The True Shape of Regret in
  Bandit Problems.
 arXiv preprint
  arXiv:1602.07182.
 https://arxiv.org/abs/1602.07182. [ bib | http ] | 
| [GHMS18] | A. Garivier, H. Hadiji, P. Menard, and G. Stoltz (2018).
 KL-UCB-switch: optimal regret bounds for stochastic bandits
  from both a distribution-dependent and a distribution-free viewpoints.
 arXiv preprint
  arXiv:1805.05071.
 https://arxiv.org/abs/1805.05071. [ bib | http ] | 
| [Hay05] | S. Haykin (2005).
 Cognitive Radio: Brain-Empowered Wireless Communications.
 Journal on Selected Areas in Communications, 23(2):201--220. [ bib ] | 
| [H+16] | E. Hazan et al. (2016).
 Introduction to online convex optimization.
 Foundations and Trends in Optimization,
  2(3-4):157--325. [ bib ] | 
| [Hon19] | J. Honda (2019).
 A Note on KL-UCB+ Policy for the Stochastic Bandit.
 arXiv preprint
  arXiv:1903.07839.
 https://arxiv.org/abs/1903.07839. [ bib | http ] | 
| [JMP12] | W. Jouini, C. Moy, and J. Palicot (2012).
 Decision Making for Cognitive Radio Equipment: Analysis of the
  First 10 Years of Exploration.
 EURASIP Journal on Wireless Communications and Networking,
  2012(1). [ bib ] | 
| [KK18] | E. Kaufmann and W. M. Koolen (2018).
 Mixture Martingales Revisited with Applications to Sequential
  Tests and Confidence Intervals.
 arXiv preprint
  arXiv:1811.11419.
 https://arXiv.org/abs/1811.11419. [ bib | http ] | 
| [CGM+13] | O. Cappé, A. Garivier, O.-A. Maillard, R. Munos, and G. Stoltz (2013).
 Kullback-Leibler Upper Confidence Bounds For Optimal
  Sequential Allocation.
 Annals of Statistics, 41(3):1516--1541. [ bib ] | 
| [KG17] | E. Kaufmann and A. Garivier (2017).
 Learning The Distribution With Largest Mean: Two Bandit
  Frameworks.
 arXiv preprint
  arXiv:1702.00001.
 https://arxiv.org/abs/1702.00001. [ bib | http ] | 
| [KM19] | E. Kaufmann and A. Mehrabian (2019).
 New Algorithms for Multiplayer Bandits when Arm Means Vary
  Among Players.
 arXiv preprint
  arXiv:1902.01239.
 https://arxiv.org/abs/1902.01239. [ bib | http ] | 
| [KDI18] | N. Keriven, G. Damien, and P. Iacopo (2018).
 NEWMA: a new method for scalable model-free online
  change-point detection.
 arXiv preprint
  arXiv:1805.08061.
 https://arxiv.org/abs/1805.08061, code at
  https://github.com/lightonai/newma. [ bib | http ] | 
| [KT19] | B. Kim and A. Tewari (2019).
 On the Optimality of Perturbations in Stochastic and
  Adversarial Multi-Armed Bandit Problems.
 arXiv preprint
  arXiv:1902.00610.
 https://arxiv.org/abs/1902.00610. [ bib | http ] | 
| [KL51] | S. Kullback and R.A. Leibler (1951).
 On Information and Sufficiency.
 The Annals of Mathematical Statistics, 22(1):79--86. [ bib ] | 
| [KDH+19] | R. Kumar, S. J. Darak, M. K. Hanawal, A. K. Sharma, and R. K. Tripathi (2019).
 Distributed Algorithm for Learning to Coordinate in
  Infrastructure-Less Network.
 IEEE Communications Letters, 23(2):362--365.
 ISSN 1089-7798. [ bib | DOI ] | 
| [LR85] | T. L. Lai and H. Robbins (1985).
 Asymptotically Efficient Adaptive Allocation Rules.
 Advances in Applied Mathematics, 6(1):4--22. [ bib ] | 
| [LX10] | T. L. Lai and H. Xing (2010).
 Sequential change-point detection when the pre-and post-change
  parameters are unknown.
 Sequential Analysis, 29(2):162--175. [ bib ] | 
| [Lat16b] | T. Lattimore (2016).
 Regret Analysis of the Anytime Optimally Confident UCB
  Algorithm.
 arXiv preprint
  arXiv:1603.08661.
 https://arxiv.org/abs/1603.08661. [ bib | http ] | 
| [Lat18] | T. Lattimore (2018).
 Refining the confidence level for optimistic bandit
  strategies.
 The Journal of Machine Learning Research, 19(1):765--796. [ bib ] | 
| [LJ18] | L. Li and K. Jamieson (2018).
 Hyperband: A Novel Bandit-Based Approach to Hyperparameter
  Optimization.
 Journal of Machine Learning Research, 18:1--52.
 https://arxiv.org/abs/1603.06560. [ bib | http ] | 
| [LKC17] | A. Luedtke, E. Kaufmann, and A. Chambaz (2017).
 Asymptotically Optimal Algorithms for Budgeted Multiple Play
  Bandits.
 Machine Learning, pages 1--31.
 https://arxiv.org/abs/1606.09388. [ bib | http ] | 
| [Lue68] | D. G. Luenberger (1968).
 Quasi-Convex Programming.
 SIAM Journal on Applied Mathematics, 16(5):1090--1095. [ bib ] | 
| [LM18] | G. Lugosi and A. Mehrabian (2018).
 Multiplayer Bandits Without Observing Collision Information.
 arXiv preprint
  arXiv:1808.08416.
 https://arxiv.org/abs/1808.08416. [ bib | http ] | 
| [MH16] | S. Maghsudi and E. Hossain (2016).
 Multi-Armed Bandits with application to 5G small cells.
 Wireless Communications, 23(3):64--73. [ bib | DOI ] | 
| [MGMM+15] | L. Melián-Gutiérrez, N. Modi, C. Moy, F. Bader,
  I. Pérez-Álvarez, and S. Zazo (2015).
 Hybrid UCB-HMM: A Machine Learning Strategy for Cognitive
  Radio in HF Band.
 IEEE Transactions on Cognitive Communications and Networking,
  1(3):347--358. [ bib ] | 
| [MM99] | J. Mitola and G. Q. Maguire (1999).
 Cognitive Radio: making software radios more personal.
 Personal Communications, 6(4):13--18. [ bib ] | 
| [MMM17] | N. Modi, P. Mary, and C. Moy (2017).
 QoS driven Channel Selection Algorithm for Cognitive Radio
  Network: Multi-User Multi-Armed Bandit Approach.
 Transactions on Cognitive Communications and Networking,
  3(1):49--66. [ bib ] | 
| [Nie11] | F. Nielsen (2011).
 Chernoff Information of Exponential Families.
 arXiv preprint
  arXiv:1102.2684.
 https://arxiv.org/abs/1102.2684. [ bib | http ] | 
| [PGNN19] | V. Patil, G. Ghalme, V. Nair, and Y. Narahari (2019).
 Stochastic Multi-Armed Bandits with Arm-specific Fairness
  Guarantees.
 arXiv preprint
  arXiv:1905.11260.
 https://arxiv.org/abs/1905.11260. [ bib | http ] | 
| [RKS17] | U. Raza, P. Kulkarni, and M. Sooriyabandara (2017).
 Low Power Wide Area Networks (LPWAN): An Overview.
 Communications Surveys Tutorials, 19(2):855--873. [ bib | DOI ] | 
| [Rob52] | H. Robbins (1952).
 Some Aspects of the Sequential Design of Experiments.
 Bulletin of the American Mathematical Society,
  58(5):527--535. [ bib ] | 
| [Rob75] | L. G. Roberts (1975).
 ALOHA Packet System With and Without Slots and Capture.
 SIGCOMM Computer Communication Review, 5(2):28--42. [ bib ] | 
| [SLC+19] | J. Seznec, A. Locatelli, A. Carpentier, A. Lazaric, and M. Valko (2019).
 Rotting Bandits Are No Harder Than Stochastic Ones.
 International Conference on Artificial Intelligence and
  Statistics.
 https://arxiv.org/abs/1811.11043. [ bib | http ] | 
| [AHK17] | S. Adish, H. Hassani, and A. Krause (2017).
 Learning to Use Learners' Advice.
 arXiv preprint
  arXiv:1702.04825.
 https://arxiv.org/abs/1702.04825. [ bib | http ] | 
| [Sli19] | A. Slivkins (June 2019).
 Introduction to Multi-Armed Bandits.
 arXiv preprint
  arXiv:1904.07272v3.
 https://arxiv.org/abs/1904.07272v3. [ bib | http ] | 
| [Tho33] | W. R. Thompson (1933).
 On the Likelihood that One Unknown Probability Exceeds Another
  in View of the Evidence of Two Samples.
 Biometrika, 25. [ bib ] | 
| [Z. 19] | Z. Tian and J. Wang and J. Wang and J. Song (2019).
 Distributed NOMA-Based Multi-Armed Bandit Approach for Channel
  Access in Cognitive Radio Networks.
 IEEE Wireless Communications Letters, pages 1--4.
 ISSN 2162-2337. [ bib | DOI ] | 
| [TdSCC13] | F. S. Truzzi, V. F. da Silva, A. H. Reali Costa, and F. Gagliardi Cozman
  (2013).
 AdBandit: a New Algorithm for Multi-Armed Bandits.
 ENIAC, 2013(1). [ bib ] | 
| [Wal45] | A. Wald (1945).
 Some Generalizations of the Theory of Cumulative Sums of
  Random Variables.
 The Annals of Mathematical Statistics, 16(3):287--293. [ bib ] | 
| [Whi88] | P. Whittle (1988).
 Restless bandits: Activity allocation in a changing world.
 Journal of Applied Probability, 25(A):287--298. [ bib ] | 
| [WCN+19] | F. Wilhelmi, C. Cano, G. Neu, B. Bellalta, A. Jonsson, and
  S. Barrachina-Muñoz (2019).
 Collaborative Spatial Reuse In Wireless Networks Via Selfish
  Multi-Armed Bandits.
 Ad Hoc Networks. [ bib ] | 
| [WBMB+19] | F. Wilhelmi, S. Barrachina-Muñoz, B. Bellalta, C. Cano, A. Jonsson, and
  G. Neu (2019).
 Potential and Pitfalls of Multi-Armed Bandits for Decentralized
  Spatial Reuse in WLANs.
 Journal of Network and Computer Applications, 127:26--42. [ bib ] | 
| [Wil38] | S. S. Wilks (1938).
 The large-sample distribution of the likelihood ratio for
  testing composite hypotheses.
 The Annals of Mathematical Statistics, 9(1):60--62. [ bib ] | 
| [Yaa77] | M. E. Yaari (1977).
 A Note on Separability and Quasiconcavity.
 Econometrica, 45(5):1183--1186. [ bib ] | 
| [ZS07] | Q. Zhao and B. M. Sadler (2007).
 A Survey of Dynamic Spectrum Access.
 Signal Processing magazine, 24(3):79--89. [ bib ] | 
| [LZ10] | K. Liu and Q. Zhao (2010).
 Distributed Learning in Multi-Armed Bandit with Multiple
  Players.
 Transaction on Signal Processing, 58(11):5667--5681. [ bib ] | 
| [ALVM06] | I. F. Akyildiz, W.-Y. Lee, M. C. Vuran, and S. Mohanty (2006).
 NeXt Generation, Dynamic Spectrum Access, Cognitive Radio
  Wireless Networks: A Survey.
 Computer Networks, 50(13):2127--2159. [ bib ] | 
| [AB10] | J.-Y. Audibert and S. Bubeck (2010).
 Regret Bounds And Minimax Policies Under Partial Monitoring.
 Journal of Machine Learning Research, 11:2785--2836. [ bib ] | 
| [Bar59] | G.A. Barnard (1959).
 Control charts and stochastic processes.
 Journal of the Royal Statistical Society. Series B
  (Methodological), pages 239--271. [ bib ] | 
| [BR19] | D. Bouneffouf and I. Rish (2019).
 A Survey on Practical Applications of Multi-Armed and
  Contextual Bandits.
 arXiv preprint
  arXiv:1904.10040, under review by
  IJCAI 2019 Survey.
 https://arxiv.org/abs/1904.10040. [ bib | http ] | 
| [C+52] | H. Chernoff et al. (1952).
 A Measure of Asymptotic Efficiency for Tests of a Hypothesis
  Based on The Sum of Observations.
 The Annals of Mathematical Statistics, 23(4):493--507. [ bib ] | 
| [Che81] | H. Chernoff (1981).
 A Note on an Inequality Involving the Normal Distribution.
 The Annals of Probability, pages 533--535. [ bib ] | 
| [GHRZ19] | Z. Gao, Y. Han, Z. Ren, and Z. Zhou (2019).
 Batched Multi-armed Bandits Problem.
 arXiv preprint
  arXiv:1904.01763.
 https://arxiv.org/abs/1904.01763. [ bib | http ] | 
| [GB12] | A. Garhwal and P. P. Bhattacharya (2012).
 A Survey on Dynamic Spectrum Access Techniques for Cognitive
  Radio.
 International Journal of Next-Generation Networks (IJNGN),
  3(4).
 https://arxiv.org/abs/1201.1964. [ bib | DOI | http ] | 
| [HR90] | T. Hagerup and C. Rüb (1990).
 A Guided Tour of Chernoff Bounds.
 Information processing letters, 33(6):305--308. [ bib ] | 
| [Hoe63] | W. Hoeffding (1963).
 Probability Inequalities for Sums of Bounded Random Variables.
 Journal of the American statistical association,
  58(301):13--30. [ bib ] | 
| [PG07] | F. Pérez and B. E. Granger (May 2007).
 IPython: a System for Interactive Scientific Computing.
 Computing in Science and Engineering, 9(3):21--29.
 https://ipython.org. [ bib | http ] | 
| [KG19] | A. Kolnogorov and S. Garbar (2019).
 Multi-Armed Bandit Problem and Batch UCB Rule.
 arXiv preprint
  arXiv:1902.00214.
 https://arxiv.org/abs/1902.00214. [ bib | http ] | 
| [KDY+16] | R. Kumar, S. J. Darak, A. Yadav, A. K. Sharma, and R. K. Tripathi (2016).
 Two-stage Decision Making Policy for Opportunistic Spectrum
  Access and Validation on USRP Testbed.
 Wireless Networks, pages 1--15. [ bib ] | 
| [KDY+17] | R. Kumar, S. J. Darak, A. Yadav, A. K. Sharma, and R. K. Tripathi (2017).
 Channel Selection for Secondary Users in Decentralized Network
  of Unknown Size.
 Communications Letters, 21(10):2186--2189. [ bib ] | 
| [LLL19] | H. Li, J. Luo, and C. Liu (2019).
 Selfish Bandit based Cognitive Anti-jamming Strategy for
  Aeronautic Swarm Network in Presence of Multiple Jammert.
 IEEE Access. [ bib ] | 
| [MM12] | J. Marinho and E. Monteiro (2012).
 Cognitive Radio: Survey on Communication Protocols Spectrum
  Decision Issues and Future Research Directions.
 Wireless Networks, 18(2):147--164. [ bib ] | 
| [Hun07] | J. D. Hunter (2007).
 Matplotlib: a 2D Graphics Environment.
 Computing In Science & Engineering, 9(3):90--95. [ bib | DOI ] | 
| [vdWCV11] | S. van der Walt, C. S. Colbert, and G. Varoquaux (March 2011).
 The NumPy Array: A Structure for Efficient Numerical
  Computation.
 Computing in Science & Engineering, 13(2):22--30. [ bib | DOI ] | 
| [VPSE16] | P. Vianney, R. Philippe, C. Sylvain, and S. Erik (04 2016).
 Batched Bandit Problems.
 The Annals of Statistics, 44(2):660--681.
 https://doi.org/10.1214/15-AOS1381. [ bib | DOI | http ] | 
| [SV95] | D. Siegmund and E.S. Venkatraman (1995).
 Using the Generalized Likelihood Ratio Statistic for
  Sequential Detection of a Change Point.
 The Annals of Statistics, pages 255--271. [ bib ] | 
| [SB11] | M. Subhedar and G. Birajdar (2011).
 Spectrum Sensing Techniques in Cognitive Radio Networks: a
  Survey.
 International Journal of Next-Generation Networks,
  3(2):37--51. [ bib ] | 
| [TZZ19] | C. Tao, Q. Zhang, and Y. Zhou (2019).
 Collaborative Learning with Limited Interaction: Tight Bounds
  for Distributed Exploration in Multi-Armed Bandits.
 arXiv preprint
  arXiv:1904.03293.
 https://arxiv.org/abs/1904.03293. [ bib | http ] | 
| [WHCW19] | Y. Wang, J. Hu, X. Chen, and L. Wang (2019).
 Distributed Bandit Learning: How Much Communication is Needed
  to Achieve (Near) Optimal Regret.
 arXiv preprint
  arXiv:1904.06309.
 https://arxiv.org/abs/1904.06309. [ bib | http ] | 
| [YA09] | T. Yucek and H. Arslan (2009).
 A Survey of Spectrum Sensing Algorithms for Cognitive Radio
  Applications.
 IEEE Communications Surveys & Tutorials, 11(1):116--130. [ bib ] | 
| [YRJW17] | F. Yang, A. Ramdas, K. Jamieson, and M. Wainwright (2017).
 A framework for Multi-A(rmed)/B(andit) Testing with Online FDR
  Control.
 In Advances in Neural Information Processing Systems, pages
  5957--5966. Curran Associates, Inc. [ bib ] | 
| [CL11] | O. Chapelle and L. Li (2011).
 An Empirical Evaluation of Thompson Sampling.
 In Advances in Neural Information Processing Systems, pages
  2249--2257. Curran Associates, Inc. [ bib ] | 
| [Abr70] | N. Abramson (1970).
 The ALOHA System: Another Alternative for Computer
  Communications.
 In Proceedings of the November 17-19, 1970, Fall Joint Computer
  Conference, AFIPS '70 (Fall), pages 281--285. ACM, New York, NY, USA. [ bib | DOI ] | 
| [ACE09] | D. Agarwal, B. Chen, and P. Elango (2009).
 Explore Exploit Schemes For Web Content Optimization.
 In International Conference on Data Mining. IEEE. [ bib ] | 
| [ALNS17] | A. Agarwal, H. Luo, B. Neyshabur, and R. E. Schapire (2017).
 Corralling a Band of Bandit Algorithms.
 In Conference on Learning Theory, pages 12--38. PMLR. [ bib ] | 
| [AG12] | S. Agrawal and N. Goyal (2012).
 Analysis of Thompson sampling for the Multi-Armed Bandit
  problem.
 In Conference On Learning Theory, pages 36--65. PMLR. [ bib ] | 
| [AMF17] | R. Alami, O.-A. Maillard, and R. Féraud (2017).
 Memory Bandits: Towards the Switching Bandit Problem Best
  Resolution.
 In Conference on Neural Information Processing Systems. [ bib ] | 
| [AF15] | R. Allesiardo and R. Féraud (2015).
 Exp3 with Drift Detection for the Switching Bandit Problem.
 In International Conference on Data Science and Advanced
  Analytics, pages 1--7. IEEE. [ bib ] | 
| [AMT10] | A. Anandkumar, N. Michael, and A. K. Tang (2010).
 Opportunistic Spectrum Access with multiple users: Learning
  under competition.
 In International Conference on Computer Communications.
  IEEE. [ bib ] | 
| [AMS07] | J.-Y. Audibert, R. Munos, and C. Szepesvári (2007).
 Tuning Bandit Algorithms in Stochastic Environments.
 In Algorithmic Learning Theory, pages 150--165. Springer,
  Sendai, Japan. [ bib ] | 
| [AB09] | J-Y. Audibert and S. Bubeck (2009).
 Minimax Policies for Adversarial and Stochastic Bandits.
 In Conference on Learning Theory, pages 217--226. PMLR. [ bib ] | 
| [AC16] | P. Auer and C.-K. Chiang (2016).
 An Algorithm with Nearly Optimal Pseudo Regret for Both
  Stochastic and Adversarial Bandits.
 In Conference on Learning Theory, pages 116--120. PMLR. [ bib ] | 
| [AM16] | O. Avner and S. Mannor (2016).
 Multi-User Lax Communications: a Multi-Armed Bandit
  Approach.
 In International Conference on Computer Communications.
  IEEE. [ bib ] | 
| [AC18] | A. Azari and C. Cavdar (December 2018).
 Self-organized Low-power IoT Networks: A Distributed Learning
  Approach.
 In Global Communications Conference. IEEE, Abu Dhabi, UAE. [ bib ] | 
| [BGZ14] | O. Besbes, Y. Gur, and A. Zeevi (2014).
 Stochastic Multi-Armed Bandit Problem with Non-Stationary
  Rewards.
 In Advances in Neural Information Processing Systems, pages
  199--207. [ bib ] | 
| [BK18a] | L. Besson and E. Kaufmann (2018).
 Multi-Player Bandits Revisited.
 In M. Mohri and K. Sridharan, editor, Algorithmic Learning
  Theory. Lanzarote, Spain.
 https://hal.archives-ouvertes.fr/hal-01629733. [ bib | http ] | 
| [BKM18] | L. Besson, E. Kaufmann, and C. Moy (2018).
 Aggregation of Multi-Armed Bandits Learning Algorithms for
  Opportunistic Spectrum Access.
 In Wireless Communications and Networking Conference. IEEE,
  Barcelona, Spain.
 https://hal.archives-ouvertes.fr/hal-01705292. [ bib | http ] | 
| [BBM19] | L. Besson, R. Bonnefoi, and C. Moy (April 2019).
 GNU Radio Implementation of MALIN: “Multi-Armed bandits
  Learning for Internet-of-things Networks”.
 In Wireless Communications and Networking Conference. IEEE,
  Marrakech, Morocco.
 https://hal.archives-ouvertes.fr/hal-02006825,
  following a Demonstration presented at International Conference on
  Telecommunications (ICT) 2018. [ bib | http ] | 
| [BL18] | I. Bistritz and A. Leshem (2018).
 Distributed Multi-Player Bandits: a Game Of Thrones
  Approach.
 In Advances in Neural Information Processing Systems, pages
  7222--7232. [ bib ] | 
| [BMP16] | R. Bonnefoi, C. Moy, and J. Palicot (2016).
 Advanced metering infrastructure backhaul reliability
  improvement with Cognitive Radio.
 In International Conference on Communications, Control, and
  Computing Technologies for Smart Grids, pages 230--236. [ bib | DOI ] | 
| [BBM+17] | R. Bonnefoi, L. Besson, C. Moy, E. Kaufmann, and J. Palicot (2017).
 Multi-Armed Bandit Learning in IoT Networks: Learning Helps
  Even in Non-Stationary Settings.
 In 12th EAI Conference on Cognitive Radio Oriented Wireless
  Network and Communication. Lisboa, Portugal. [ bib ] | 
| [BBMVM19] | R. Bonnefoi, L. Besson, J. C. Manco-Vasquez, and C. Moy (April 2019).
 Upper-Confidence Bound for Channel Selection in LPWA Networks
  with Retransmissions.
 In MOTIoN Workshop. IEEE, Marrakech, Morocco.
 https://hal.archives-ouvertes.fr/hal-02049824. [ bib | http ] | 
| [BS12] | S. Bubeck and A. Slivkins (2012).
 The Best Of Both Worlds Stochastic And Adversarial Bandits.
 In Conference on Learning Theory, pages 42--1. PMLR. [ bib ] | 
| [CN18] | C. Cano and G. Neu (2018).
 Wireless Optimisation via Convex Bandits: Unlicensed LTE/WiFi
  Coexistence.
 In Proceedings of the 2018 Workshop on Network Meets AI & ML,
  NetAI'18, pages 41--47. ACM, New York, NY, USA.
 ISBN 978-1-4503-5911-5. [ bib | DOI ] | 
| [CZKX19] | Y. Cao, W. Zheng, B. Kveton, and Y. Xie (2019).
 Nearly Optimal Adaptive Procedure for Piecewise-Stationary
  Bandit: a Change-Point Detection Approach.
 In International Conference on Artificial Intelligence and
  Statistics. Okinawa, Japan. [ bib ] | 
| [CLLW19] | Y. Chen, C. Lee, H. Luo, and C. Wei (2019).
 A New Algorithm for Non-stationary Contextual Bandits:
  Efficient, Optimal, and Parameter-free.
 In A. Beygelzimer and D. Hsu, editor, Conference on Learning
  Theory, volume 99, pages 1--30. PMLR.
 https://arxiv.org/abs/1902.00980. [ bib | http ] | 
| [CMP17] | R. Combes, S. Magureanu, and A. Proutiere (2017).
 Minimal Exploration in Structured Stochastic Bandits.
 In Advances in Neural Information Processing Systems, pages
  1761--1769. [ bib ] | 
| [CGH+96] | R. Corless, G. Gonnet, D. Hare, D. Jeffrey, and D. Knuth (1996).
 On the Lambert W Function.
 In Advances in Computational Mathematics, pages 329--359. [ bib ] | 
| [DP16] | R. Degenne and V. Perchet (2016).
 Anytime Optimal Algorithms In Stochastic Multi Armed
  Bandits.
 In International Conference on Machine Learning, pages
  1587--1595. [ bib ] | 
| [GC11] | A. Garivier and O. Cappé (2011).
 The KL-UCB Algorithm for Bounded Stochastic Bandits and
  Beyond.
 In Conference on Learning Theory, pages 359--376. PMLR. [ bib ] | 
| [GM11] | A. Garivier and E. Moulines (2011).
 On Upper-Confidence Bound Policies For Switching Bandit
  Problems.
 In Algorithmic Learning Theory, pages 174--188. PMLR. [ bib ] | 
| [GK16] | A. Garivier and E. Kaufmann (2016).
 Optimal Best Arm Identification with Fixed Confidence.
 In PMLR, volume 49 of Conference on Learning
  Theory. [ bib ] | 
| [GKL16] | A. Garivier, E. Kaufmann, and T. Lattimore (2016).
 On Explore-Then-Commit Strategies.
 In PMLR, volume 29 of Advances in Neural Information
  Processing Systems. [ bib ] | 
| [GGCA11] | N. Gupta, O. Granmo-Christoffer, and A. Agrawala (2011).
 Thompson Sampling for Dynamic Multi Armed Bandits.
 In International Conference on Machine Learning and
  Applications Workshops, pages 484--489. IEEE. [ bib ] | 
| [HGB+06] | C. Hartland, S. Gelly, N. Baskiotis, O. Teytaud, and M. Sebag (2006).
 Multi-Armed Bandit, Dynamic Environments and Meta-Bandits.
 In NeurIPS 2006 Workshop, Online Trading Between Exploration
  And Exploitation. [ bib ] | 
| [HT10] | J. Honda and A. Takemura (2010).
 An Asymptotically Optimal Bandit Algorithm for Bounded Support
  Models.
 In Conference on Learning Theory, pages 67--79. PMLR. [ bib ] | 
| [JKYD18] | H. Joshi, R. Kumar, A. Yadav, and S. J. Darak (2018).
 Distributed Algorithm for Dynamic Spectrum Access in
  Infrastructure-Less Cognitive Radio Network.
 In 2018 IEEE Wireless Communications and Networking Conference
  (WCNC), pages 1--6.
 ISSN 1558-2612. [ bib | DOI ] | 
| [JEMP09] | W. Jouini, D. Ernst, C. Moy, and J. Palicot (2009).
 Multi-Armed Bandit Based Policies for Cognitive Radio's
  Decision Making Issues.
 In International Conference Signals, Circuits and Systems.
  IEEE. [ bib ] | 
| [JEMP10] | W. Jouini, D. Ernst, C. Moy, and J. Palicot (2010).
 Upper Confidence Bound Based Decision Making Strategies and
  Dynamic Spectrum Access.
 In International Conference on Communications, pages 1--5.
  IEEE. [ bib | DOI ] | 
| [KNJ12] | D. Kalathil, N. Nayyar, and R. Jain (2012).
 Decentralized Learning for Multi-Player Multi-Armed Bandits.
 In Conference on Decision and Control. IEEE. [ bib ] | 
| [KCG12] | E. Kaufmann, O. Cappé, and A. Garivier (2012).
 On Bayesian Upper Confidence Bounds for Bandit Problems.
 In International Conference on Artificial Intelligence and
  Statistics, pages 592--600. [ bib ] | 
| [KKM12] | E. Kaufmann, N. Korda, and R. Munos (2012).
 Thompson Sampling: an Asymptotically Optimal Finite-Time
  Analysis.
 In Algorithmic Learning Theory, pages 199--213. PMLR. [ bib ] | 
| [KCG14] | E. Kaufmann, O. Cappé, and A. Garivier (2014).
 On the Complexity of A/B Testing.
 In Conference on Learning Theory, pages 461--481. PMLR. [ bib ] | 
| [KAF+18] | R. Kerkouche, R. Alami, R. Féraud, N. Varsier, and P. Maillé (2018).
 Node-based optimization of LoRa transmissions with Multi-Armed
  Bandit algorithms.
 In International Conference on Telecommunications. J. Palicot
  and R. Pyndiah, Saint-Malo, France. [ bib ] | 
| [KS06] | L. Kocsis and C. Szepesvári (2006).
 Discounted UCB.
 In 2nd PASCAL Challenges Workshop. [ bib ] | 
| [KHN15] | J. Komiyama, J. Honda, and H. Nakagawa (2015).
 Optimal Regret Analysis of Thompson Sampling in Stochastic
  Multi-Armed Bandit Problem with Multiple Plays.
 In International Conference on Machine Learning, volume 37,
  pages 1152--1161. PMLR. [ bib ] | 
| [TRY17] | K. Tomer, L. Roi, and M. Yishay (2017).
 Bandits with Movement Costs and Adaptive Pricing.
 In Conference on Learning Theory, volume 65, pages
  1242--1268. PMLR. [ bib ] | 
| [KYDH18] | R. Kumar, A. Yadav, S. J. Darak, and M. K. Hanawal (2018).
 Trekking Based Distributed Algorithm for Opportunistic
  Spectrum Access in Infrastructure-Less Network.
 In 2018 16th International Symposium on Modeling and
  Optimization in Mobile, Ad-Hoc, and Wireless Networks (WiOpt), pages 1--8. [ bib | DOI ] | 
| [KSGB19] | B. Kveton, C. Szepesvari, M. Ghavamzadeh, and C. Boutilier (2019).
 Perturbed-History Exploration in Stochastic Multi-Armed
  Bandits.
 In 28th International Joint Conference on Artificial
  Intelligence (IJCAI 2019).
 https://arxiv.org/abs/1902.10089. [ bib | http ] | 
| [KPV17] | J. Kwon, V. Perchet, and C. Vernade (2017).
 Sparse Stochastic Bandits.
 In Conference on Learning Theory, pages 1269--1270. [ bib ] | 
| [LVC16] | P. Lagrée, C. Vernade, and O. Cappé (2016).
 Multiple-Play Bandits in the Position-Based Model.
 In Advances in Neural Information Processing Systems, pages
  1597--1605. [ bib ] | 
| [Lat16c] | T. Lattimore (2016).
 Regret Analysis Of The Finite Horizon Gittins Index Strategy
  For Multi Armed Bandits.
 In Conference on Learning Theory, pages 1214--1245. PMLR. [ bib ] | 
| [LM09] | A. Lazaric and R. Munos (2009).
 Hybrid Stochastic-Adversarial On-Line Learning.
 In Conference on Learning Theory. [ bib ] | 
| [LCLS10] | L. Li, W. Chu, J. Langford, and R. E. Schapire (2010).
 A Contextual-Bandit Approach to Personalized News Article
  Recommendation.
 In International Conference on World Wide Web, pages 661--670.
  ACM. [ bib ] | 
| [LPSY18] | D. Liau, E. Price, Z. Song, and G. Yang (2018).
 Stochastic Multi-Armed Bandits in Constant Space.
 In International Conference on Artificial Intelligence and
  Statistics. [ bib ] | 
| [LZ08] | K. Liu and Q. Zhao (2008).
 A Restless Bandit Formulation of Opportunistic Access:
  Indexablity and Index Policy.
 In Annual Communications Society Conference on Sensor, Mesh
  and Ad-Hoc Communications and Networks Workshops. IEEE. [ bib ] | 
| [LLS18] | F. Liu, J. Lee, and N. Shroff (2018).
 A Change-Detection based Framework for Piecewise-stationary
  Multi-Armed Bandit Problem.
 In The Thirty-Second AAAI Conference on Artificial
  Intelligence (AAAI 2018). [ bib ] | 
| [LBCU+09] | M. López-Benítez, F. Casadevall, A. Umbert, J. Pérez-Romero,
  R. Hachemani, J. Palicot, and C. Moy (2009).
 Spectral Occupation Measurements and Blind Standard
  Recognition Sensor for Cognitive Radio Networks.
 In 2009 4th International Conference on Cognitive Radio
  Oriented Wireless Networks and Communications, pages 1--9. IEEE. [ bib ] | 
| [LRC+16] | J. Louëdec, L. Rossi, M. Chevalier, A. Garivier, and J. Mothe (2016).
 Algorithme de bandit et obsolescence : un modèle pour la
  recommandation.
 In 18ème Conférence francophone sur l'Apprentissage
  Automatique, 2016 (Marseille, France). [ bib ] | 
| [LWAL18] | H. Luo, C. Wei, A. Agarwal, and J. Langford (2018).
 Efficient Contextual Bandits in Non-stationary Worlds.
 In S. Bubeck, V. Perchet, and P. Rigollet, editors, Proceedings
  of the 31st Conference On Learning Theory, volume 75 of Proceedings of
  Machine Learning Research, pages 1739--1776. PMLR.
 https://http://proceedings.mlr.press/v75/luo18a.html. [ bib | .html | .pdf ] | 
| [MM11] | O.-A. Maillard and R. Munos (2011).
 Adaptive Bandits: Towards the best history-dependent
  strategy.
 In International Conference on Artificial Intelligence and
  Statistics, pages 570--578. [ bib ] | 
| [Mai19] | O.-A. Maillard (2019).
 Sequential change-point detection: Laplace concentration of
  scan statistics and non-asymptotic delay bounds.
 In Algorithmic Learning Theory. [ bib ] | 
| [MS13] | J. Mellor and J. Shapiro (2013).
 Thompson Sampling in Switching Environments with Bayesian
  Online Change Detection.
 In Artificial Intelligence and Statistics, pages 442--450. [ bib ] | 
| [MG17] | P. Ménard and A. Garivier (2017).
 A Minimax and Asymptotically Optimal Algorithm for Stochastic
  Bandits.
 In Algorithmic Learning Theory, volume 76, pages 223--237.
  PMLR. [ bib ] | 
| [MTC+16] | A. Maskooki, V. Toldov, L. Clavier, V. Loscrí, and N. Mitton (February
  2016).
 Competition: Channel Exploration/Exploitation Based on a
  Thompson Sampling Approach in a Radio Cognitive Environment.
 In International Conference on Embedded Wireless Systems and
  Networks (dependability competition). Graz, Austria. [ bib ] | 
| [MM17] | J. Mourtada and O.-A. Maillard (2017).
 Efficient Tracking of a Growing Number of Experts.
 In Algorithmic Learning Theory, volume 76 of
  Proceedings of Algorithmic Learning Theory, pages 1--23. Tokyo, Japan. [ bib ] | 
| [MB19] | C. Moy and L. Besson (May 2019).
 Decentralized Spectrum Learning for IoT Wireless Networks
  Collision Mitigation.
 In ISIoT workshop. Santorin, Greece.
 https://sites.google.com/view/ISIoT2019/. [ bib | http | .pdf ] | 
| [Moy14] | C. Moy (2014).
 Reinforcement Learning Real Experiments for Opportunistic
  Spectrum Access.
 In WSR'14, page 10. Karlsruhe, Germany.
 
  https://hal-supelec.archives-ouvertes.fr/hal-00994975. [ bib | http ] | 
| [NC17] | O. Naparstek and K. Cohen (2017).
 Deep Multi-User Reinforcement Learning for Dynamic Spectrum
  Access in Multichannel Wireless Networks.
 In GLOBECOM 2017 - 2017 IEEE Global Communications Conference,
  pages 1--7. [ bib | DOI ] | 
| [RMZ14] | C. Robert, C. Moy, and H. Zhang (2014).
 Opportunistic Spectrum Access Learning Proof of Concept.
 In SDR-WinnComm'14, page 8. Schaumburg, United States.
 
  https://hal-supelec.archives-ouvertes.fr/hal-00994940. [ bib | http ] | 
| [RSS16] | J. Rosenski, O. Shamir, and L. Szlak (2016).
 Multi-Player Bandits -- A Musical Chairs Approach.
 In International Conference on Machine Learning, pages
  155--163. PMLR. [ bib ] | 
| [SLM12] | A. Sani, A. Lazaric, and R. Munos (2012).
 Risk-Aversion In Multi-Armed Bandits.
 In Advances in Neural Information Processing Systems, pages
  3275--3283. [ bib ] | 
| [SKHD18] | S. Sawant, R. Kumar, M. K. Hanawal, and S. J. Darak (2018).
 Learning to Coordinate in a Decentralized Cognitive Radio
  Network in Presence of Jammers.
 In 16th International Symposium on Modeling and Optimization
  in Mobile, Ad Hoc, and Wireless Networks (WiOpt). IEEE, Shanghai, China.
 https://arxiv.org/abs/1803.06810. [ bib | DOI | http ] | 
| [SL17] | Y. Seldin and G. Lugosi (2017).
 An Improved Parametrization and Analysis of the EXP3++
  Algorithm for Stochastic and Adversarial Bandits.
 In Conference on Learning Theory, volume 65, pages 1--17.
  PMLR. [ bib ] | 
| [TL11] | C. Tekin and M. Liu (2011).
 Performance and Convergence of Multi-User Online Learning.
 In International Conference on Game Theory for Networks,
  pages 321--336. Springer Berlin Heidelberg. [ bib ] | 
| [TL12] | C. Tekin and M. Liu (2012).
 Online Learning in Decentralized Multi-User Spectrum Access
  with Synchronized Explorations.
 In Military Communications Conference. IEEE. [ bib ] | 
| [TPHD19] | H. Tibrewal, S. Patchala, M. K. Hanawal, and S. J. Darak (2019).
 Distributed Learning and Optimal Assignment in Multiplayer
  Heterogeneous Networks.
 In IEEE Conference on Computer Communications (INFOCOM
  2019), pages 1693--1701. IEEE.
 https://arxiv.org/abs/1901.03868. [ bib | http ] | 
| [TCLM16] | V. Toldov, L. Clavier, V. Loscrí, and N. Mitton (September 2016).
 A Thompson Sampling Approach to Channel Exploration
  Exploitation Problem in Multihop Cognitive Radio Networks.
 In PIMRC, pages 1--6. Valencia, Spain. [ bib | DOI ] | 
| [WS18b] | L. Wei and V. Srivatsva (2018).
 On Abruptly-Changing And Slowly-Varying Multi-Armed Bandit
  Problems.
 In American Control Conference, pages 6291--6296. IEEE. [ bib ] | 
| [WS18a] | L. Wei and V. Srivastava (2018).
 On Distributed Multi-player Multi-Armed Bandit Problems in
  Abruptly-Changing Environment.
 In Conference on Decision and Control, pages 5783--5788. IEEE. [ bib ] | 
| [YFE12] | X. Yang, A. Fapojuwo, and E. Egbogah (September 2012).
 Performance Analysis and Parameter Optimization of Random
  Access Backoff Algorithm in LTE.
 In Vehicular Technology Conference, pages 1--5. IEEE. [ bib | DOI ] | 
| [YM09] | J. Y. Yu and S. Mannor (2009).
 Piecewise-Stationary Bandit Problems with Side Observations.
 In International Conference on Machine Learning, pages
  1177--1184. ACM. [ bib ] | 
| [ZBLN19] | S. M. Zafaruddin, I. Bistritz, A. Leshem, and D. Niyato (2019).
 Distributed Learning for Channel Allocation Over a Shared
  Spectrum.
 In 20th IEEE International Workshop on SignalProcessing
  Advances in Wireless Communications (SPAWC). Cannes, France.
 https://arxiv.org/abs/1902.06353. [ bib | http ] | 
| [ZS19] | J. Zimmert and Y. Seldin (2019).
 An Optimal Algorithm for Stochastic and Adversarial Bandits.
 In K. Chaudhuri and M. Sugiyama, editors, Proceedings of
  Machine Learning Research, volume 89 of Proceedings of Machine
  Learning Research, pages 467--475. PMLR.
 http://proceedings.mlr.press/v89/zimmert19a.html. [ bib | .html | .pdf ] | 
| [ABM10] | J-Y. Audibert, S. Bubeck, and R. Munos (2010).
 Best Arm Identification in Multi-Armed Bandits.
 In Conference on Learning Theory, page 13. PMLR. [ bib ] | 
| [ACBFS95] | P. Auer, N. Cesa-Bianchi, Y. Freund, and R. Schapire (1995).
 Gambling in a Rigged Casino: The Adversarial Multi-Armed
  Bandit Problem.
 In Annual Symposium on Foundations of Computer Science, pages
  322--331. IEEE. [ bib ] | 
| [BV19] | M. Bande and V. V. Veeravalli (2019).
 Adversarial Multi-User Bandits for Uncoordinated Spectrum
  Access.
 In IEEE International Conference on Acoustics, Speech and
  Signal Processing (ICASSP), pages 4514--4518. IEEE. [ bib ] | 
| [DNMP16] | S. J. Darak, A. Nafkha, C. Moy, and J. Palicot (2016).
 Is Bayesian Multi Armed Bandit Algorithm Superior? Proof of
  Concept for Opportunistic Spectrum Access in Decentralized Networks.
 In 11th EAI Conference on Cognitive Radio Oriented Wireless
  Network and Communication. Grenoble, France. [ bib ] | 
| [K+16] | T. Kluyver et al. (2016).
 Jupyter Notebooks -- a publishing format for reproducible
  computational workflows.
 In F. Loizides and B. Schmidt, editors, Positioning and Power
  in Academic Publishing: Players, Agents and Agendas, pages 87--90. IOS
  Press. [ bib ] | 
| [DMNM16] | S. J. Darak, N. Modi, A. Nafkha, and C. Moy (2016).
 Spectrum Utilization and Reconfiguration Cost Comparison of
  Various Decision Making Policies for Opportunistic Spectrum Access Using Real
  Radio Signals.
 In 11th EAI Conference on Cognitive Radio Oriented Wireless
  Network and Communication. Grenoble, France. [ bib ] | 
| [PPS11] | K. Patil, R. Prasad, and K. Skouby (2011).
 A Survey of Worldwide Spectrum Occupancy Measurement Campaigns
  for Cognitive Radio.
 In 2011 International Conference on Devices and Communications
  (ICDeCom), pages 1--5. IEEE. [ bib ] | 
| [VMB+10] | V. Valenta, R. Maršálek, G. Baudoin, M. Villegas, M. Suarez, and
  F. Robert (2010).
 Survey on spectrum utilization in Europe: Measurements,
  analyses and observations.
 In 5th EAI Conference on Cognitive Radio Oriented Wireless
  Network and Communication, pages 1--5. IEEE. [ bib ] | 
| [Chi18] | F. Chiusano (2018).
 Breakpoint Prediction for the Abruptly-Switching
  Non-Stationary Multi-Armed Bandit Problem.
 Master's thesis, Politecnico Di Milano, AI & R Lab, Laboratorio di
  Intelligenza Artificiale e Robotica del Politecnico di Milano. [ bib ] | 
| [Wei17] | E. W. Weisstein (2017).
 Exponential Integral.
 Online at
  https://mathworld.wolfram.com/ExponentialIntegral.html.
 From MathWorld -- A Wolfram Web Resource. [ bib ] | 
| [Col17] | Collective (2017).
 Exponential Integral.
 Online at
  https://en.wikipedia.org/wiki/Exponential_integral.
 From Wikipedia, The Free Encyclopedia. [ bib ] | 
| [GNUa] | GNU Radio Companion Documentation and Website.
 Online at
  https://wiki.gnuradio.org/index.php/GNURadioCompanion.
 Accessed: 2018-09-25. [ bib ] | 
| [GNUb] | GNU Radio Documentation and Website.
 Online at https://www.gnuradio.org/about/.
 Accessed: 2018-09-25. [ bib ] | 
| [Jor10] | M. Jordan (2010).
 Stat 260/CS 294.
 Online at
  https://people.eecs.berkeley.edu/~jordan/courses/260-spring10/.
 Chapter 8 covers the exponential family. [ bib ] | 
| [Oct] | OctoClock Clock Distribution Module with GPSDO - Ettus Research.
 Online at
  https://www.ettus.com/product/details/OctoClock-G.
 Accessed: 2018-09-25. [ bib ] | 
| [Bes19] | L. Besson (2016--2019).
 SMPyBandits: an Open-Source Research Framework for Single and
  Multi-Players Multi-Arms Bandits (MAB) Algorithms in Python.
 Code at https://GitHub.com/SMPyBandits/SMPyBandits/,
  documentation at https://SMPyBandits.GitHub.io/. [ bib ] | 
| [Tol14a] | J. Toledano (June 2014).
 Executive Summary: Dynamic Spectrum Management For Innovation
  And Growth.
 Online at
  https://www.economie.gouv.fr/files/files/PDF/french-spectrum-mission-executive-summary-2014-06-25.pdf.
 
  https://www.ladocumentationfrancaise.fr/rapports-publics/144000381/index.shtml. [ bib | http | .pdf ] | 
| [Tol14b] | J. Toledano (June 2014).
 Une gestion dynamique du spectre pour l'innovation et la
  croissance.
 Online at
  https://www.economie.gouv.fr/files/files/PDF/rapport-gestion-dynamique-spectre-2014-06-30.pdf.
 
  https://www.ladocumentationfrancaise.fr/rapports-publics/144000381/index.shtml. [ bib | http | .pdf ] | 
| [Lat16a] | T. Lattimore (2016).
 Library for Multi-Armed Bandit Algorithms.
 Online at: https://github.com/tor/libbandit.
 https://github.com/tor/libbandit. [ bib | http ] | 
| [Ett] | Ettus.
 USRP Hardware Driver and USRP Manual.
 Online at
  https://files.ettus.com/manual/page_usrp2.html.
 Accessed: 2018-09-25. [ bib ] | 
| [Raj17] | V. Raj (2017).
 A Julia Package for providing Multi Armed Bandit
  Experiments.
 Online at: https://github.com/v-i-s-h/MAB.jl.
 https://github.com/v-i-s-h/MAB.jl. [ bib | http ] | 
| [BBS+19] | S. Behnel, R. Bradshaw, Dag S. Seljebotn, G. Ewing, W. Stein, G. Gellner,
  et al. (2019).
 Cython: C-Extensions for Python.
 Online at: https://cython.org.
 https://cython.org. [ bib | http ] | 
| [C+18] | A. Collette et al. (2018).
 h5py: HDF5 for Python.
 Online at: https://www.h5py.org.
 https://www.h5py.org. [ bib | http ] | 
| [Var17] | G. Varoquaux (March 2017).
 Joblib: running Python functions as pipeline jobs.
 Online at: https://joblib.readthedocs.io.
 https://joblib.readthedocs.io. [ bib | http ] | 
| [I+17] | Anaconda Inc. et al. (2017).
 Numba, NumPy aware dynamic Python compiler using LLVM.
 Online at: https://numba.pydata.org.
 https://numba.pydata.org. [ bib | http ] | 
| [CGK12] | O. Cappé, A. Garivier, and E. Kaufmann (2012).
 pymaBandits.
 https://mloss.org/software/view/415, online at:
  https://mloss.org/software/view/415. [ bib | http ] | 
| [Fou17] | Python Software Foundation (October 2017).
 Python Language Reference, version 3.6.
 Online at: https://www.python.org.
 https://www.python.org. [ bib | http ] | 
| [JOP+01] | E. Jones, T. E. Oliphant, P. Peterson, et al. (2001).
 SciPy: Open source scientific tools for Python.
 Online at: https://www.scipy.org.
 https://www.scipy.org. [ bib | http ] | 
| [W+17] | M. Waskom et al. (September 2017).
 Seaborn: statistical data visualization.
 Online at: https://seaborn.pydata.org.
 https://seaborn.pydata.org. [ bib | DOI | http ] | 
| [B+18] | G. Brandl et al. (2018).
 Sphinx: Python documentation generator.
 Online at: https://sphinx-doc.org.
 https://sphinx-doc.org. [ bib | http ] | 
| [BP+16] | I. Bicking, PyPA, et al. (November 2016).
 Virtualenv: a tool to create isolated Python environments.
 Online at: https://virtualenv.pypa.io.
 https://virtualenv.pypa.io. [ bib | http ] | 
| [Bod17] | Q. Bodinier (2017).
 Coexistence of Communication Systems Based on Enhanced
  Multi-Carrier Waveforms with Legacy OFDM Networks.
 Ph.D. thesis, CentraleSupélec.
 https://www.theses.fr/2017REN1S091. [ bib | http ] | 
| [Jou17] | W. Jouini (2017).
 Contribution to Learning and Decision Making under Uncertainty
  for Cognitive Radio.
 Ph.D. thesis, CentraleSupélec, IETR, Rennes.
 https://www.theses.fr/2012SUPL0010. [ bib | http ] | 
| [Kau14] | E. Kaufmann (2014).
 Analysis of Bayesian and Frequentist Strategies for Sequential
  Resource Allocation.
 Ph.D. thesis, Telecom ParisTech.
 https://www.theses.fr/2014ENST0056. [ bib | http ] | 
| [Mod17] | N. Modi (2017).
 Machine Learning and Statistical Decision Making for Green
  Radio.
 Ph.D. thesis, CentraleSupélec, IETR, Rennes.
 https://www.theses.fr/2017SUPL0002. [ bib | http ] | 
| [Tol17] | V. Toldov (2017).
 Adaptive MAC Layer for Interference Limited WSN.
 Ph.D. thesis, Université Lille 1 Sciences et technologies.
 https://www.theses.fr/2017LIL10002. [ bib | http ] | 
| [Val16] | M. Valko (2016).
 Bandits on Graphs and Structures.
 Habilitation thesis to supervise research, École normale
  supérieure de Cachan. [ bib ] We investigate the structural properties of certain sequential decision-making problems with limited feedback (bandits) in order to bring the known algorithmic solutions closer to a practical use. In the first part, we put a special emphasis on structures that can be represented as graphs on actions, in the second part we study the large action spaces that can be of exponential size in the number of base actions or even infinite. We show how to take advantage of structures over the actions and (provably) learn faster. | 
| [BK18b] | L. Besson and E. Kaufmann (February 2018).
 What Doubling Trick Can and Can't Do for Multi-Armed
  Bandits.
 https://hal.archives-ouvertes.fr/hal-01736357,
  preprint, https://hal.archives-ouvertes.fr/hal-01736357. [ bib | http | www: ] | 
| [BBM18] | L. Besson, R. Bonnefoi, and C. Moy (June 2018).
 Multi-Arm Bandit Algorithms for Internet of Things Networks: A
  TestBed Implementation and Demonstration that Learning Helps.
 https://ict-2018.org/demos,youtu.be/HospLNQhcMk,
  Demonstration presented at International Conference on Telecommunications. [ bib | http ] | 
| [BK19b] | L. Besson and E. Kaufmann (February 2019).
 Combining the Generalized Likelihood Ratio Test and kl-UCB for
  Non-Stationary Bandits.
 https://hal.archives-ouvertes.fr/hal-02006471,
  preprint, https://hal.archives-ouvertes.fr/hal-02006471, arXiv
  preprint arXiv:1902.01575. [ bib | http ] | 
| [MBDT19] | C. Moy, L. Besson, G. Delbarre, and L. Toutain (July 2019).
 Decentralized Spectrum Learning for Radio Collision Mitigation
  in Ultra-Dense IoT Networks: LoRaWAN Case Study and Measurements.
 https://hal.inria.fr/hal-XXX, Submitted for a
  special volume on Machine Learning for Intelligent Wireless Communications
  and Networking, Annals of Telecommunications. [ bib | http | .pdf ] | 
| [MM19] | S. Mukherje and O.-A. Maillard (2019).
 Improved Changepoint Detection for Piecewise i.i.d Bandits.
 https://subhojyoti.github.io/pdf/aistats_2019.pdf,
  working paper or preprint. [ bib | .pdf ] | 
| [RK17] | V. Raj and S. Kalyani (2017).
 Taming Non-Stationary Bandits: a Bayesian Approach.
 https://arxiv.org/abs/1707.09727. [ bib | http ] | 
| [Bes18] | L. Besson (2018).
 SMPyBandits: an Experimental Framework for Single and
  Multi-Players Multi-Arms Bandits Algorithms in Python.
 https://hal.archives-ouvertes.fr/hal-01840022,
  preprint, submitted to JMLR MLOSS,
  https://hal.archives-ouvertes.fr/hal-01840022. [ bib | http | www: ] | 
| [BN93] | M. Basseville and I. Nikiforov (1993).
 Detection of Abrupt Changes: Theory And Application, volume
  104.
 Prentice Hall Englewood Cliffs. [ bib ] | 
| [Bón11] | M. Bóna (2011).
 A Walk Through Combinatorics: an Introduction to Enumeration
  and Graph Theory.
 World Scientific. [ bib ] | 
| [BLM13] | S. Boucheron, G. Lugosi, and P. Massart (2013).
 Concentration Inequalities: A Nonasymptotic Theory of
  Independence.
 Oxford university press. [ bib ] | 
| [BV04] | S. Boyd and L. Vandenberghe (2004).
 Convex Optimization.
 Cambridge Univ. Press. [ bib ] | 
| [CBL06] | N. Cesa-Bianchi and G. Lugosi (2006).
 Prediction, Learning, and Games.
 Cambridge University Press. [ bib ] | 
| [LS19] | T. Lattimore and C. Szepesvári (2019).
 Bandit Algorithms.
 Cambridge University Press.
 https://tor-lattimore.com/downloads/book/book.pdf,
  draft of Wednesday 1st of May, 2019,
  https://tor-lattimore.com/downloads/book/book.pdf. [ bib | .pdf ] | 
| [Nor98] | J. R. Norris (1998).
 Markov Chains, volume 2 of Cambridge Series in
  Statistical and Probabilistic Mathematics.
 Cambridge University Press, Cambridge. [ bib ] | 
| [Sie13] | D. Siegmund (2013).
 Sequential analysis: tests and confidence intervals.
 Springer Science & Business Media. [ bib ] | 
| [SB18] | R. S. Sutton and A. G. Barto (2018).
 Reinforcement Learning: An introduction.
 MIT press. [ bib ] | 
| [TNB15] | A. Tartakovsky, I. Nikiforov, and M. Basseville (2015).
 Sequential Analysis. Hypothesis Testing and Changepoint
  Detection.
 CRC Press. [ bib ] | 
This file was generated by bibtex2html 1.99.