### *IEEE WCNC 2019*: "**GNU Radio Implementation of Multi-Armed bandits Learning for Internet-of-things Networks**"
- *Date* :date: : $17$th of April $2019$
- *By* :wave: : [Lilian Besson](https://GitHub.com/Naereen/slides/), PhD Student in France, co-advised by
![9%](../common/LogoCS.png) ![14%](../common/LogoIETR.png) ![12%](../common/LogoInria.jpg)
| *Christophe Moy*
@ Univ Rennes 1 & IETR, Rennes | *Emilie Kaufmann*
@ CNRS & Inria, Lille |
|:---:|:---:|
> See our paper at [`HAL.Inria.fr/hal-02006825`](https://hal.inria.fr/hal-02006825)
---
# Introduction
- We implemented a demonstration of a simple IoT network
- Using open-source software (GNU Radio) and USRP boards from Ettus Research / National Instrument
- In a wireless ALOHA-based protocol, IoT devices are able to improve their network access efficiency by using *embedded* *decentralized* *low-cost* machine learning algorithms
(so simple implementation that it can be run on IoT device side)
- The Multi-Armed Bandit model fits well for this problem
- Our demonstration shows that using the simple UCB algorithm can lead to great empirical improvement in terms of successful transmission rate for the IoT devices
> Joint work by R. Bonnefoi, ==L. Besson== and C. Moy.
---
# :timer_clock: Outline
## 1. Motivations
## 2. System Model
## 3. Multi-Armed Bandit (MAB) Model and Algorithms
## 4. GNU Radio Implementation
## 5. Results
---
# 1. Motivations
- :chart_with_upwards_trend: IoT (the Internet of Things) is the most promizing new paradigm and business opportunity of modern wireless telecommunications,
- :chart_with_upwards_trend: More and more IoT devices are using unlicensed bands
- $\Longrightarrow$ networks will be more and more occupied :boom:
But...
---
# 1. Motivations
- $\Longrightarrow$ networks will be more and more occupied :boom:
But...
- Heterogeneous spectrum occupancy in most IoT networks standards
- Simple but efficient learning algorithm can give great improvements in terms of successful communication rates
- IoT can improve their battery lifetime and mitigate spectrum overload thanks to learning!
- $\Longrightarrow$ more devices can cohabit in IoT networks in unlicensed bands !
---
# 2. System Model
### Wireless network
- In unlicensed bands (e.g. ISM bands: 433 or 868 MHz, 2.4 or 5 GHz)
- $K=4$ (or more) orthogonal channels
### One gateway, many IoT devices
- One gateway, handling different devices
- Using a ALOHA protocol (without retransmission)
- Devices send data for $1$s in one channel, wait for an *acknowledgement* for $1$s in same channel, use Ack as feedback: success / failure
- Each device: communicate from time to time (e.g., every $10$ s)
- Goal: max successful communications $\Longleftrightarrow$ max nb of received Ack
---
# 2. System Model
![bg original 80%](plots/system_model.png)
---
# Hypotheses
1. We focus on **one gateway**, $K \geq 2$ channels
2. Different IoT devices using the same standard are able to run a low-cost learning algorithm on their embedded CPU
3. The spectrum occupancy generated by the rest of
the environment is **assumed to be stationary**
4. And **non uniform traffic**:
some channels are more occupied than others.
---
# 3. Multi-Armed Bandits (MAB)
## 3.1. Model
## 3.2. Algorithms
---
# 3.1. Multi-Armed Bandits Model
- $K \geq 2$ resources (*e.g.*, channels), called **arms**
- Each time slot $t=1,\ldots,T$, you must choose one arm, denoted $A(t)\in\{1,\ldots,K\}$
- You receive some reward $r(t) \sim \nu_k$ when playing $k = A(t)$
- **Goal:** maximize your sum reward $\sum\limits_{t=1}^{T} r(t)$, or expected $\sum\limits_{t=1}^{T} \mathbb{E}[r(t)]$
- Hypothesis: rewards are stochastic, of mean $\mu_k$.
Example: Bernoulli distributions.
### Why is it famous?
Simple but good model for **exploration/exploitation** dilemma.
---
# 3.2. Multi-Armed Bandits Algorithms
### Often "*index* based"
- Keep *index* $I_k(t) \in \mathbb{R}$ for each arm $k=1,\ldots,K$
- Always play $A(t) = \arg\max I_k(t)$
- $I_k(t)$ should represent our belief of the *quality* of arm $k$ at time $t$
### (:boom: unefficient) Example: "Follow the Leader"
- $X_k(t) := \sum\limits_{s < t} r(s) \bold{1}(A(s)=k)$ sum reward from arm $k$
- $N_k(t) := \sum\limits_{s < t} \bold{1}(A(s)=k)$ number of samples of arm $k$
- And use $I_k(t) = \hat{\mu}_k(t) := \frac{X_k(t)}{N_k(t)}$.
---
## *Upper Confidence Bounds* algorithm (UCB)
- Instead of $I_k(t) = \hat{\mu}_k(t) = \frac{X_k(t)}{N_k(t)}$, :ok_hand: add an *exploration term*
$$ I_k(t) = \mathrm{UCB}_k(t) = \frac{X_k(t)}{N_k(t)} + \sqrt{\frac{\alpha \log(t)}{2 N_k(t)}} $$
### Parameter $\alpha =$ trade-off exploration *vs* exploitation
- Small $\alpha \Longleftrightarrow$ focus more on **exploitation**,
- Large $\alpha \Longleftrightarrow$ focus more on **exploration**,
- Typically $\alpha=1$ works fine empirically and theoretically.
---
# 4. GNU Radio Implementation
## 4.1. Physical layer and protocol
## 4.2. Equipment
## 4.3. Implementation
## 4.4. User interface
---
# 4.1. Physical layer and protocol
> Very simple ALOHA-based protocol, $K=4$ channels
An uplink message :zap: $\,\nearrow\,$ is made of...
- a preamble (for phase synchronization)
- an ID of the IoT device, made of QPSK symbols $1\pm1j \in \mathbb{C}$
- then arbitrary data, made of QPSK symbols $1\pm1j \in \mathbb{C}$
A downlink (Ack) message :zap: $\,\swarrow\,$ is then...
- same preamble
- the same ID
(so a device knows if the Ack was sent for itself or not)
---
# 4.2. Equipment
$\geq3$ USRP
boards
1: gateway
2: traffic
generator
3: IoT dynamic
devices
(as much as we want)
![bg original 55%](plots/our-demo.png)
---
# 4.3. :hammer_and_wrench: Implementation
- Using GNU Radio and GNU Radio Companion
- Each USRP board is controlled by one *flowchart*
- Blocks are implemented in C++
- MAB algorithms are simple to code
(examples...)
---
# Flowchart of the random traffic generator
![bg original 95%](plots/USRP_TX_PU__v1__simple_grc.png)
---
# Flowchart of the IoT gateway
![bg original 95%](plots/USRP_RX_BTS__v1__simple_grc.png)
---
# Flowchart of the IoT dynamic device
![bg original 95%](plots/USRP_TX_SU__v1__simple_grc.png)
---
# 4.4. User interface of our demonstration
$\hookrightarrow$ See video of the demo: [`YouTu.be/HospLNQhcMk`](https://youtu.be/HospLNQhcMk)
![33%](plots/UI.png)
---
# 5. Example of simulation and results
On an example of a small IoT network:
- with $K=4$ channels,
- and *non uniform* "background" traffic (other networks),
with a repartition of $15\%$, $10\%$, $2\%$, $1\%$
1. $\Longrightarrow$ the uniform access strategy obtains
a successful communication rate of about $40\%$.
2. About $400$ communication slots are enough for the learning IoT devices to reach a successful communication rate close to $80\%$,
with UCB algorithm or another one (Thompson Sampling).
> Note: similar gains of performance were obtained in other scenarios.
---
# Illustration
![bg original 70%](plots/plot_datafile_append_Uniform_vs_UCB_vs_TS.png)
---
# 6. Conclusion
## :point_right: Take home message
## Dynamically reconfigurable IoT devices can learn on their own to favor certain channels, if the environment traffic is not uniform between the $K$ channels, and greatly improve their succesful communication rates!
## :pray: Please ask questions !
---
# 6. Conclusion
### $\hookrightarrow$ See our paper: [`HAL.Inria.fr/hal-02006825`](https://hal.inria.fr/hal-02006825) :point_left:
### $\hookrightarrow$ See video of the demo: [`YouTu.be/HospLNQhcMk`](https://youtu.be/HospLNQhcMk)
### $\hookrightarrow$ See the code of our demo: :point_left:
> Under GPL open-source license, for GNU Radio:
[bitbucket.org/scee_ietr/malin-multi-arm-bandit-learning-for-iot-networks-with-grc](https://bitbucket.org/scee_ietr/malin-multi-arm-bandit-learning-for-iot-networks-with-grc/) :point_left:
Thanks for listening :+1: !