Index of /publis/slides/2018_04__Presentation_IEEE_WCNC

[ICO]NameLast modifiedSizeDescription
[PARENTDIR]Parent Directory  -  
[DIR]plots/2021-03-04 16:23 -  
[   ].gitignore2021-03-04 16:23 7  
[   ]Makefile2021-03-04 16:23 1.5K 
[TXT]README.md2021-03-04 16:23 8.8K 
[TXT]preprocess_tex.sh2021-03-04 16:23 119  
[TXT]slides.md2021-03-04 16:23 9.0K 
[   ]slides.pdf2021-03-04 16:23 1.9M 
[   ]slides.pdfpc2021-03-04 16:23 32  
[TXT]slides.tex2021-03-04 16:23 22K 
[   ]slides_169.pdf2021-03-04 16:23 847K 
[   ]slides_169.pdfpc2021-03-04 16:23 36  
[TXT]slides_pandoc.md2021-03-04 16:23 10K 
[   ]slides_pandoc.pdf2021-03-04 16:23 936K 
[   ]slides_pandoc.pdfpc2021-03-04 16:23 39  

author: Lilian Besson *Advised by Christophe Moy Émilie Kaufmann title: Aggregation of MAB Learning Algorithms for OSA institute: PhD Student, Team SCEE, IETR, CentraleSupélec, Rennes & Team SequeL, CRIStAL, Inria, Lille date: IEEE WCNC - 16th April 2018

lang: english

IEEE WCNC: « Aggregation of Multi-Armed Bandits Learning Algorithms for Opportunistic Spectrum Access »

Christophe Moy
@ CentraleSupélec
& IETR, Rennes
Émilie Kaufmann
@ Inria, Lille
7% 14% 12%

See our paper HAL.Inria.fr/hal-01705292


Introduction

$\Longrightarrow$ we propose to use an online learning algorithm to also decide which algorithm to use, to be more robust and adaptive to unknown environments.


⏲ Outline

  1. Opportunistic Spectrum Access
  2. Multi-Armed Bandits
  3. MAB algorithms
  4. Aggregation of MAB algorithms
  5. Illustration



Please 🙏

Ask questions at the end if you want!


1. Opportunistic Spectrum Access


Communication & interaction model


2. Multi-Armed Bandits

Model

Why is it famous?

Simple but good model for exploration/exploitation dilemma.


3. MAB algorithms


3.1 Multi-Armed Bandit algorithms

Often index based

Example: "Follow the Leader"


3.2 First example of algorithm
Upper Confidence Bounds algorithm (UCB)

Parameter $\alpha$: tradeoff exploration vs exploitation

💥 Problem: how to choose "the good $\alpha$" for a certain problem?


3.3 Second example of algorithm
Thompson sampling (TS)

Example with Beta prior, for binary rewards

💥 How to choose "the good prior" for a certain problem?


4. Aggregation of MAB algorithms

Problem

Solutions

$\hookrightarrow$ Aggregation?

Not a new idea, studied from the 90s in the ML community.


4.1 Basic idea for online aggregation

If you have $\mathcal{A}_1,\ldots,\mathcal{A}_N$ different algorithms

Problems

  1. How to increase $\pi^{t+1}_{a^t}$ ?
  2. What information should we give to which algorithms?

4.2 Overview of the Exp4 aggregation algorithm

For rewards in $r(t) \in [-1,1]$.


Use an unbiased estimate of the rewards

Using directly $r(t)$ to update trust probability yields a biased estimator

$$\mathbb{E}[\hat{r}(t)] = \sum\limits{a=1}^N \mathbb{P}(a^t = a) \mathbb{E}[r(t) / \pi^t{a}]$$ $$= \mathbb{E}[r(t)] \sum\limits{a=1}^N \frac{\mathbb{P}(a^t = a)}{\pi^t{a}} = \mathbb{E}[r(t)] $$


4.3 Our Aggregator aggregation algorithm

Improves on Exp4 by the following ideas:


5. Some illustrations


On a simple Bernoulli problem

bg original 105%


On a "hard" Bernoulli problem

bg original 105%


On a mixed problem

bg original 105%


Conclusion (1/2)


Conclusion (2/2)

See our paper

HAL.Inria.fr/hal-01705292

See our code for experimenting with bandit algorithms

Python library, open source at SMPyBandits.GitHub.io

Thanks for listening !