Index of /publis/slides/2019_04__Presentation_IEEE_WCNC__MoTION_Workshop

Name	Last modified	Size
Parent Directory		-
plots/	2021-03-04 16:23	-
.gitignore	2021-03-04 16:23	7
Makefile	2021-03-04 16:23	1.5K
README.md	2021-03-04 16:23	10K
preprocess_tex.sh	2021-03-04 16:23	119
slides.md	2021-03-04 16:23	11K
slides.pdf	2021-03-04 16:23	1.6M
slides.pdfpc	2021-03-04 16:23	32
slides_169.pdf	2021-03-04 16:23	1.7M
slides_169.pdfpc	2021-03-04 16:23	36
slides_pandoc.pdfpc	2021-03-04 16:23	39
---
title: Upper-Confidence Bound for Channel Selection in LPWA Networks with Retransmissions
subtitle: MoTION Workshop @ IEEE WCNC 2019
author: Lilian Besson
institute: SCEE Team, IETR, CentraleSupélec, Rennes
date: Monday 14th of April, 2019
lang: english
---

### *1st MoTION Workshop - 2019*: "**Upper-Confidence Bound for Channel Selection in LPWA Networks with Retransmissions**"

- *Date* :date: : $15$th of April $2019$

- *Who:* [Lilian Besson](https://GitHub.com/Naereen/slides/) :wave: , PhD Student in France, co-advised by

| *Christophe Moy* <br> @ IETR, Rennes | *Emilie Kaufmann* <br> @ CNRS & Inria, Lille |
|:---:|:---:|
| ![8%](../common/LogoCS.png) ![14%](../common/LogoIETR.png) | ![12%](../common/LogoInria.jpg) ![16%](../common/LogoCNRS.jpg) |

> See our paper at [`HAL.Inria.fr/hal-02049824`](https://hal.inria.fr/hal-02049824)

---

# :timer_clock: Outline

## 1. Motivations
## 2. System model
## 3. Multi-armed bandit (MAB) model and algorithms
## 4. Proposed heuristics
## 5. Numerical simulations and results

#### Please :pray: ask questions *at the end* if you want!

> By R. Bonnefoi, L. Besson, J. Manco-Vasquez and C. Moy.

---

# 1. Motivations

- IoT networks are interesting and will be more and more present,
- More and more IoT objects
- $\Longrightarrow$ networks will be more and more occupied

But...

- Heterogeneous spectrum occupancy in most IoT networks standards
- Maybe IoT objects can improve their communication by *learning* to access the network more efficiently (e.g., by using the less occupied spectrum channel)
- Simple but efficient learning algorithm can give great improvements in terms of successful communication rates
- $\Longrightarrow$ can fit more objects in the existing IoT networks :tada: !

---

# 2. System model

### Wireless network

- In ISM band, centered at $433.5$ MHz (in Europe)
- $K=4$ (or more) orthogonal channels

### One gateway, many IoT devices

- One gateway, handling different objects
- Communications with ALOHA protocol **with retransmission**
- Objects send data for $1$s in one channel, wait for an *acknowledgement* for $1$s in same channel,
  use Ack as feedback: success / failure

---

### Transmission and retransmission model
- Each object communicates from time to time (e.g., every $10$ s)
  $\Longleftrightarrow$ probability $p$ of transmission at every time (Bernoulli process)

- Retransmit at most $M$ times if first transmission failed
  (until Ack is received)

- Retransmissions can use a different channel that the one used for first transmission

- Retransmissions happen after a random back-off time
  back-off time $\sim\mathcal{U}(0,\cdots,M-1)$

### The goal of each object
Is to *max*imize its successful communication rates
$\Longleftrightarrow$ *max*imize its number of received Ack.

---

# Do we need learning for transmission? Yes!

#### First hypothesis
The surrounding traffic is not uniformly occupying the $K$ channels.

#### Consequence
- Then it is always sub-optimal to use a (naive) uniformly random channel access

- $\Longrightarrow$ we can use online machine learning to let each IoT device learn, on its own and in an automatic and decentralized way, which channel is the best one (= less occupied) in its current environment.

- Learning is actually *needed* to achieve (close to) optimal performance.

---

# Do we need learning for *re*transmission?

#### Second hypothesis
Imagine a set of IoT devices learned to transmit efficiently
(in the most free channel), in one IoT network.

#### Question
- Then if two devices collide, do they have a higher probability of colliding again *if retransmissions happen in the same channel* ?

---

# Mathematical intution and illustration

Consider one IoT device and one channel, we consider two probabilities:

- $p_c$ : suffering a collision at first transmission,
- $p_{c1}$ : collision at the first retransmission (if it uses the same channel).

In an example network with...
- a small transmission probability $p=10^{-3}$,
- from $N=50$ to $N=400$ IoT devices,

- $\Longrightarrow$ we ran simulations showing that
  $p_{c1}$ can be more than twice of $p_c$ (from $5\%$ to $15\%$!)

---

![75%](plots/Approximation_m10.png)

---

# Do we need learning for *re*transmission? Yes we do!

#### Consequence

- Then if two devices collide, they have a higher probability of colliding again *if retransmissions happen in the same channel*

- $\Longrightarrow$ we can also use online machine learning to let each IoT device learn, on its own and in an automatic and decentralized way, which channel is the best one (= less occupied)
  to retransmit a packet which failed due to a collision.

- Learning is again *needed* to achieve (close to) optimal performance.

---

# 3. Multi-Armed Bandits (MAB

## 3.1. Model

## 3.2. Algorithms

---

# 3.1. Multi-Armed Bandits Model
- $K \geq 2$ resources (*e.g.*, channels), called **arms**
- Each time slot $t=1,\ldots,T$, you must choose one arm, denoted $C(t)\in\{1,\ldots,K\}$
- You receive some reward $r(t) \sim \nu_k$ when playing $k = C(t)$
- **Goal:** maximize your sum reward $\sum\limits_{t=1}^{T} r(t)$, or expected $\sum\limits_{t=1}^{T} \mathbb{E}[r(t)]$
- Hypothesis: rewards are stochastic, of mean $\mu_k$.
  Example: Bernoulli distributions.

## Why is it famous?
Simple but good model for **exploration/exploitation** dilemma.

---

# 3.2. Multi-Armed Bandits Algorithms
### Often "*index* based"
- Keep *index* $U_k(t) \in \mathbb{R}$ for each arm $k=1,\ldots,K$
- Always use channel $C(t) = \arg\max U_k(t)$
- $U_k(t)$ should represent our belief of the *quality* of arm $k$ at time $t$

### Example: "Follow the Leader"
- $X_k(t) := \sum\limits_{s < t} r(s) \bold{1}(A(s)=k)$ sum reward from arm $k$
- $N_k(t) := \sum\limits_{s < t} \bold{1}(A(s)=k)$ number of samples of arm $k$
- And use $U_k(t) = \hat{\mu}_k(t) := \frac{X_k(t)}{N_k(t)}$.

---

# *Upper Confidence Bounds* algorithm (UCB)
- Instead of using $U_k(t) = \frac{X_k(t)}{N_k(t)}$, add an *exploration term*
$$ U_k(t) = \frac{X_k(t)}{N_k(t)} + \sqrt{\alpha \frac{\log(t)}{N_k(t)}} $$

### Parameter $\alpha$: tradeoff exploration *vs* exploitation
- Small $\alpha$: focus more on **exploitation**,
- Large $\alpha$: focus more on **exploration**,
- Typically $\alpha=1$ works fine empirically and theoretically.

---

# *Upper Confidence Bounds* algorithm (UCB)

![90%](plots/Algorithm1_UCB.png)

---

# 4. We Study Different Heuristics (5)

- They all use one UCB algorithm to decide the channel to use for first transmissions of any message

- They use different approaches for retransmissions:
  - "Only UCB": use same $\mathrm{UCB}$ for retransmissions,
  - "Random": uniformly random retransmissions,
  - "UCB": use another $\mathrm{UCB}^r$ for retransmissions
	(no matter the channel for first transmission),
  - "K-UCB": use $K$ different $\mathrm{UCB}^j$ for retransmission after a first transmission on channel $j\in\{1,\cdots,K\}$,
  - "Delayed UCB": use another $\mathrm{UCB}^d$ for retransmissions, but launched after a delay $\Delta$.

---

# 4.0. Only UCB

Use the same $\mathrm{UCB}$ to decide the channel to use for any transmissions, regardless if it's a first transmission or a retransmission of a message.

![80%](plots/Algorithm1_UCB.png)

---

# 4.1. UCB + Random Retransmissions

![90%](plots/Algorithm2_UCB_RandomRetransmission.png)

---

# 4.2. UCB + a single UCB for Retransmissions

![90%](plots/Algorithm3_UCB_UCBRetransmission.png)

---

# 4.3. UCB + $K$ UCB for Retransmissions

![85%](plots/Algorithm4_UCB_KUCBRetransmission.png)

---

# 4.4. UCB + Random Retransmission

![85%](plots/Algorithm5_UCB_DelayedUCBRetransmission.png)

---

# 5. Numerical simulations and results

### What
- We simulate a network,
- With many IoT dynamic devices.

### Why
- They implement the UCB learning algorithm to learn to optimize their *first* transmission of any uplink packets,
- And the different heuristic to (try to) learn to optimize their *retransmissions* of the packets after any collision.

---

# 5.1. First experiment

We consider an example network with...

- $K=4$ channels (e.g., like in LoRa),
- $M=5$ maximum number of retransmission,
- $m=5$ maximum back-off interval,
- $p=10^{-3}$ transmission probability,
- $5=20 \times 10^4$ time slots,
- from $N=1000$ IoT devices.

Non uniform occupancy of the $4$ channels:
they are occupied $10$, $30$, $30$ and $30\%$ of times (by other IoT networks).

---

![80%](plots/ResultsUCB.png)

---

# 5.2. Second experiment

Non uniform occupancy of the $4$ channels:
they are occupied $40$, $30$, $20$ and $30\%$ of times (by other IoT networks).

---

![80%](plots/ResultsUCB2.png)

---

# 6. Summary (1/3)
## Settings
1. For **LPWA networks** based onan **ALOHA protocol**
   (slotted both in time and frequency),
2. We presented a **retransmission model**
3. Dynamic **IoT devices** can use **simple machine learning algorithms**, to improve their successful communication rate,
4. We focus on the packet retransmissions upon radio collision, by using low-cost **Multi-Armed Bandit** algorithms, like **UCB**.

---

# 6. Summary (2/3)
## We presented

Several **learning heuristics**

- that try to learn how to transmit and retransmit in a smarter way
- by using the classical UCB algorithm for **channel selection for first transmission**:  it has a **low memory and computation cost**, easy to add on an embedded CPU of an IoT device
- and different ideas based on UCB for the retransmissions upon collisions, that add no cost/memory overhead.

---

# 6. Summary (3/3)
## We showed

- Using machine learning for the *transmission* is **needed** to achieve optimal performance, and can lead to significant gain in terms of successful transmission rates  (up-to 30% in the example network).

- Using machine learning for the *retransmission* is also useful, and improves over previous approach unaware of retransmission.

- The proposed heuristics outperform a naive random access scheme.

- Surprisingly, the main take-away message is that a simple UCB learning approach, that retransmit in the same channel, turns out to perform as well as more complicated heuristics.

---

# 6. Future works

- Implement our proposed approach in a real-world demo
  For instance using USRP boards.
- Study a real IoT LPWAN protocol (e.g., LoRa)
- Explore in LoRa how to use machine learning (e.g., Multi-Armed Bandit algorithms) to let IoT devices learn on their own the best retransmission pattern to follow in a given scenario.

---

# More ?

### → See our paper: [`HAL.Inria.fr/hal-02049824`](https://hal.inria.fr/hal-02049824)

### :pray: Please ask questions !

<span class="fontify">Thanks for listening !</span>