Index of /publis/slides/2017_09__Presentation_article_CrownCom_Conference

Name	Last modified	Size	Description
Parent Directory		-
.vscode/	2021-02-23 11:50	-
slides_169.pdf	2021-03-04 16:26	263K
slides.pdf	2021-03-04 16:26	295K
perf_learning-eps-converted-to.pdf	2021-03-04 16:26	15K
100intelligent-eps-converted-to.pdf	2021-03-04 16:26	11K
50intelligent-eps-converted-to.pdf	2021-03-04 16:26	12K
30intelligent-eps-converted-to.pdf	2021-03-04 16:26	12K
10intelligent-eps-converted-to.pdf	2021-03-04 16:26	12K
protocol-eps-converted-to.pdf	2021-03-04 16:26	21K
slides_169.tex	2021-03-04 16:26	20K
slides_169.pdfpc	2021-03-04 16:26	142
slides.tex	2021-03-04 16:26	20K
slides.pdfpc	2021-03-04 16:26	138
slides.md	2021-03-04 16:26	12K
slides.html	2021-03-04 16:26	9.5K	MAB Learning in IoT Networks. Learning helps even in non-stationary settings! \| 20 Sept 2017 \| Lilian Besson
protocol.png	2021-03-04 16:26	36K
protocol.eps	2021-03-04 16:26	23K
preprocess_tex.sh	2021-03-04 16:26	119
perf_learning.png	2021-03-04 16:26	38K
perf_learning.eps	2021-03-04 16:26	141K
README.md	2021-03-04 16:26	8.9K
Makefile	2021-03-04 16:26	1.3K
50intelligent.png	2021-03-04 16:26	27K
50intelligent.eps	2021-03-04 16:26	112K
30intelligent.png	2021-03-04 16:26	30K
30intelligent.eps	2021-03-04 16:26	113K
10intelligent.png	2021-03-04 16:26	30K
10intelligent.eps	2021-03-04 16:26	113K
100intelligent.png	2021-03-04 16:26	24K
100intelligent.eps	2021-03-04 16:26	111K
Expose_GdR_ISIS__Lilian_Besson__Multi-Armed_Bandit_for_IoT_Networks__17-11-17__43.pdf	2021-03-04 16:26	295K

---
author:
  __Lilian Besson__ and Rémi Bonnefoi
  and Christophe Moy and Émilie Kaufmann and Jacques Palicot
title: MAB Learning in IoT Networks. Learning helps even in non-stationary settings!
institute:
  PhD Student in France
  Team SCEE, IETR, CentraleSupélec, Rennes
  \& Team SequeL, CRIStAL, Inria, Lille
conference: CROWNCOM 2017
date: 20-21 sept 2017
lang: english
---

> Hey, we got the **Best Paper Award** during the [CrownCom 2017](http://crowncom.org/2017/) conference for [this article](https://hal.inria.fr/hal-01575419).
> This document is a raw version of the slides [I (Lilian Besson)](http://perso.crans.org/besson/) used to present our article.
> Please see the [PDF (4:3)](http://perso.crans.org/besson/publis/slides/2017_09__Presentation_article_CrownCom_Conference/slides.pdf) or [PDF (16:9)](http://perso.crans.org/besson/publis/slides/2017_09__Presentation_article_CrownCom_Conference/slides_169.pdf) for a nice version of these slides.

# We want
A *lot* of IoT devices want to access to a gateway of base station.

- Insert them in a **crowded wireless network**.
- With a protocol **slotted in time and frequency**.
- Each device has a **low duty cycle**
  (a few messages per day).

## Goal
- Maintain a **good Quality of Service**.
- **Without** centralized supervision!

## How?
- Use **learning algorithms**:
  devices will learn on which frequency they should talk!

----

# Outline
1. Introduction and motivation
2. Model and hypotheses
3. Baseline algorithms :
   to compare against naive and efficient centralized approaches
4. Two Multi-Armed Bandit algorithms :
   UCB, Thompson sampling
5. Experimental results
6. Perspectives and future works
7. Conclusion

----

# Model
- Discrete time $t\geq1$ and $N_c$ radio channels (*e.g.*, 10)
  (*known*)

![Protocol in time and frequency, with an Acknowledgement.](protocol.png)

- $D$ **dynamic** devices try to access the network *independently*
- $S=S_1+\dots+S_{N_c}$ **static** devices occupy the network :
  $S_1,\dots,S_{N_c}$ in each channel
  (*unknown*).

----

# Hypotheses
## Emission model
- Each device has the same *low* emission probability:
  each step, each device sends a packet with probability $p$.
  (this gives a duty cycle proportional to $1/p$)

## Background traffic
- Each static device uses only one channel.
- Their repartition is fixed in time.

> $\implies$ Background traffic, bothering the dynamic devices!

## Dynamic radio reconfiguration
- Each **dynamic device decides the channel it uses to send every packet**.
- It has memory and computational capacity to implement basic decision algorithm.

## Problem
- *Goal* : *maximize packet loss ratio ($=$ number of received `Ack`)
  in a *finite-space discrete-time Decision Making Problem*.
- *Solution ?* **Multi-Armed Bandit algorithms**,
  **decentralized** and used **independently** by each device.

----

# A naive strategy : uniformly random access
- **Uniformly random access**:
  dynamic devices choose uniformly their channel in the pull of $N_c$ channels.
- Natural strategy, dead simple to implement.

- Simple analysis, in term of **successful transmission probability** (for every message from dynamic devices) :
  $$\mathbb{P}(\text{success}|\text{sent}) = \sum_{i=1}^{N_c} \underbrace{(1 - p / N_c)^{D-1}}_{\text{No other dynamic device}} \times \underbrace{(1-p)^{S_i}}_{\text{No static device}} \times\; \frac{1}{N_c}.$$

- Works fine only if all channels are similarly occupied,\newline
  but **it cannot learn** to exploit the best (more free) channels.

----

# Optimal centralized strategy
- If an oracle can decide to affect $D_i$ dynamic devices to channel $i$,
  the **successful transmission probability** is:
  $$\mathbb{P}(\text{success}|\text{sent}) = \sum_{i=1}^{N_c} \underbrace{(1 - p)^{D_i - 1}}_{\;\;D_i - 1 \;\text{others}\;\;} \times \underbrace{(1 - p)^{S_i}}_{\;\;\text{No static device}\;\;} \times \underbrace{ D_i / D }_{\;\;\text{Sent in channel}\; i}.$$

- The oracle has to solve this **optimization problem**:
  $$\underset{D_1,\dots,D_{N_c}}{\arg\max}\;\;\; & \sum_{i=1}^{N_c} D_i (1 - p)^{S_i + D_i -1}\\
  \text{such that}\;\;\; & \sum_{i=1}^{N_c} D_i = D \; \text{and} \; D_i \geq 0, \; \; \forall 1 \leq i \leq N_c .$$

- We solved this quasi-convex optimization problem with *Lagrange multipliers*, only numerically.
- $\implies$ Very good performance, maximizing the transmission rate of all the $D$ dynamic devices

## But unrealistic
But **not achievable in practice**: no centralized oracle!

## Let see *realistic decentralized approaches*
- $\hookrightarrow$ Machine Learning ?
- $\hookrightarrow$ Reinforcement Learning ?
- $\hookrightarrow$ *Multi-Armed Bandit* !

----

# Multi-Armed Bandit formulation
A dynamic device tries to collect *rewards* when transmitting :

- at each time step $\tau$ when its Bernoulli emission process tells it to communicate,
- chooses a channel $A(\tau) \in \{1,\dots,N_c\}$,
- if `Ack` (no collision)
  $\implies$ reward $r_{A(\tau)} = 1$,
- if collision (no `Ack`)
  $\implies$ reward $r_{A(\tau)} = 0$.

## Reinforcement Learning interpretation
Maximize transmission rate $\equiv$ **maximize cumulated rewards**
$$\max_{\text{algorithm}\;A} \;\; \sum_{\tau=1}^{\text{horizon}} r_{A(\tau)}.$$

# Upper Confidence Bound algorithm ($\mathrm{UCB}_1$)
A dynamic device keeps $\tau$ number of sent packets, $T_k(t)$ selections of channel $k$, $X_k(t)$ successful transmission in channel $k$.

1. For the first $N_c$ steps ($\tau=1,\dots,N_c$), try each channel *once*.
2. Then for the next steps $t \geq N_c$ :
    - Compute the index $g_k(\tau) := \underbrace{\frac{X_k(\tau)}{N_k(\tau)}}_{\text{Mean}\; \widehat{\mu_k}(\tau)} + \underbrace{\sqrt{\frac{\log(\tau)}{2 N_k(\tau)}}.}_{\text{Upper Confidence Bound}}$
    - Choose channel $A(\tau) = \mathop{\arg\max}\limits_{k} \; g_k(\tau)$,
    - Update $T_k(\tau+1)$ and $X_k(\tau+1)$.

> *References*: [Lai \& Robbins, 1985], [Auer et al, 2002], [Bubeck \& Cesa-Bianchi, 2012].

----

# Thompson Sampling : Bayesian approach
A dynamic device assumes a stochastic hypothesis on the background traffic, modeled as Bernoulli distributions.

- Rewards $r_k(\tau)$ are assumed to be *i.i.d.* samples from a Bernoulli distribution $\mathrm{Bern}(\mu_k)$.

- A **binomial Bayesian posterior** is kept on the mean availability $\mu_k$ : $\mathrm{Bin}(1 + X_k(\tau), 1 + N_k(\tau) - X_k(\tau))$.
- Starts with a *uniform prior* : $\mathrm{Bin}(1, 1) \sim \mathcal{U}([0,1])$.

1. Each step $\tau \geq 1$, a sample is drawn from each posterior
  $i_k(t) \sim \mathrm{Bin}(a_k(\tau), b_k(\tau))$,
2. Choose channel $A(\tau) = \mathop{\arg\max}\limits_k \; i_k(\tau)$,
3. Update the posterior after receiving `Ack` or if collision.

> *References*: [Thompson, 1935], [Kaufmann et al, 2012].

----

# Experimental setting

## Simulation parameters
- $N_c = 10$ channels,
- $S + D = 10000$ devices in total,
- $p = 10^{-3}$ probability of emission,
- $\text{horizon} = 10^5$ time slots ($\simeq 100$ messages $/$ device),
- The proportion of dynamic devices $D/(S+D)$ varies,
- Various settings for $(S_1,\dots,S_{N_c})$ static devices repartition.

## What do we show
- After a short learning time, MAB algorithm are almost as efficient as the oracle solution.
- Never worse than the naive solution.
- Thompson sampling is even more efficient than UCB.

----

# $10\%$ of dynamic devices

![$10\%$ of dynamic devices. $7\%$ of gain.](10intelligent.png)

----

# $30\%$ of dynamic devices

![$30\%$ of dynamic devices. $3\%$ of gain but not much is possible.](30intelligent.png)

----

# $50\%$ of dynamic devices

![$50\%$ of dynamic devices.](50intelligent.png)

----

# $100\%$ of dynamic devices (extreme case)

![$100\%$ of dynamic devices.](100intelligent.png)

----

# Dependence on $D/(S+D)$

![Almost optimal, for any proportion of dynamic devices, after a short learning time. Up-to $16\%$ gain over the naive approach!](perf_learning.png)

----

# Perspectives
## Theoretical results
- MAB algorithms have performance guarantees for *stochastic settings*,
- But here the collisions cancel the *i.i.d.* hypothesis,
- Not easy to obtain guarantees in this mixed setting
  (*i.i.d.* emission process, game theoretic collisions).

## Real-world experimental validation ?
- Real-world radio experiments will help to validate this.
  *In progress*...

----

# Other direction of future work
- *More realistic emission model*:
  maybe driven by number of packets in a whole day,
  instead of emission probability.

- Validate this on a *larger experimental scale*.

----

## We showed numerically...
- After a learning period, MAB algorithms are almost as efficient as the oracle solution,
- Never worse than the naive solution.
- Thompson sampling is even more efficient than UCB.
- Simple algorithms are up-to $16\%$ more efficient than the naive approach.

## But more work is still needed...
- **Theoretical guarantees** are still missing.
- Maybe study **other emission models**.
- And also implement this on **real-world radio devices**.

> **Thanks!** *Question?*