### *1st MoTION Workshop - 2019*: "**Upper-Confidence Bound for Channel Selection in LPWA Networks with Retransmissions**"
- *Date* :date: : $15$th of April $2019$ - *By* :wave: : [Lilian Besson](https://GitHub.com/Naereen/slides/), PhD Student in France, co-advised by ![9%](../common/LogoCS.png) ![14%](../common/LogoIETR.png) ![12%](../common/LogoInria.jpg) | *Christophe Moy*
@ Univ Rennes 1 & IETR, Rennes | *Emilie Kaufmann*
@ CNRS & Inria, Lille | |:---:|:---:| > See our paper at [`HAL.Inria.fr/hal-02049824`](https://hal.inria.fr/hal-02049824) --- # :timer_clock: Outline ## 1. Motivations ## 2. System model ## 3. Multi-armed bandit (MAB) model and algorithms ## 4. Proposed heuristics ## 5. Numerical simulations and results #### Please :pray: ask questions *at the end* if you want! > By R. Bonnefoi, ==L. Besson==, J. Manco-Vasquez and C. Moy. --- # 1. Motivations - :chart_with_upwards_trend: IoT (the Internet of Things) is the most promizing new paradigm and business opportunity of modern wireless telecommunications, - :chart_with_upwards_trend: More and more IoT devices are using unlicensed bands - $\Longrightarrow$ networks will be more and more occupied :boom: But... --- # 1. Motivations - $\Longrightarrow$ networks will be more and more occupied :boom: But... - Heterogeneous spectrum occupancy in most IoT networks standards - Simple but efficient learning algorithm can give great improvements in terms of successful communication rates - IoT can improve their battery lifetime and mitigate spectrum overload thanks to learning! - $\Longrightarrow$ can fit more devices in the existing IoT networks :tada: ! --- # 2. System model ### Wireless network - In unlicensed bands, like the ISM bands - $K=4$ (or more) orthogonal channels
### One gateway, many IoT devices - One gateway, handling different devices - Using a slotted ALOHA protocol **with retransmissions** - Devices send data in one channel ($\nearrow$ uplink), wait for an *acknowledgement* ($\swarrow$ downlink) in same channel, use Ack as feedback : success / failure --- ### Transmission and retransmission model - Each device communicates from time to time (e.g., every hour) $\Longleftrightarrow$ probability $p$ of transmission at every time (Bernoulli process) - Retransmit at most $M$ times if first transmission failed (until Ack is received). $\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;$ (Ex. $M=10$) - Retransmissions can use a different channel that the one used for first transmission - Retransmissions happen after a random back-off time back-off time $\sim\mathcal{U}(0,\cdots,m-1)$ $\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;$ (Ex. $m=10$) ### The goal of each device Is to *max*imize its successful communication rates $\Longleftrightarrow$ *max*imize its number of received Ack. --- # Do we need learning for transmission? Yes! #### First hypothesis The surrounding traffic is not uniformly occupying the $K$ channels. #### Consequence - Then it is always sub-optimal to use a (naive) uniformly random channel access - $\Longrightarrow$ we can use online machine learning to let each IoT device learn, on its own and in an automatic and decentralized way, which channel is the best one (= less occupied) in its current environment. - ==Learning is actually *needed* to achieve (close to) optimal performance.== --- # Do we need learning for ==*re*transmission==? #### Second hypothesis Imagine a set of IoT devices learned to transmit efficiently (in the most free channels), in one IoT network. #### Question - Then if two devices collide, do they have a higher probability of ==colliding again== *if retransmissions happen in the same channel* ? --- ## Mathematical intuition and illustration Consider one IoT device and one channel, we consider two probabilities: - $p_c$ : suffering a collision at first transmission, - $p_{c1}$ : collision at the first retransmission (if it uses the same channel).
In an example network with... - a small transmission probability $p=10^{-3}$, - from $N=50$ to $N=400$ IoT devices,
- $\Longrightarrow$ we ran simulations showing that ==$p_{c1}$ can be more than twice of $p_c$ (from $5\%$ to $15\%$!)== --- ![bg original 75%](plots/Approximation_m10.png) --- # Do we need learning for *re*transmission? ### :point_right: Maybe we do! ### Consequence - Then if two devices collide, they have a higher probability of colliding again *if retransmissions happen in the same channel* - $\Longrightarrow$ we can also use online machine learning to let each IoT device learn, on its own and in an automatic and decentralized way, which channel is the best one (= less occupied) to retransmit a packet which failed due to a collision. - ==Learning is *maybe needed* to achieve (close to) optimal performance!== --- # 3. Multi-Armed Bandits (MAB)
## 3.1. Model ## 3.2. Algorithms --- # 3.1. Multi-Armed Bandits Model - $K \geq 2$ resources (*e.g.*, channels), called **arms** - Each time slot $t=1,\ldots,T$, you must choose one arm, denoted $C(t)\in\{1,\ldots,K\}$ - You receive some reward $r(t) \sim \nu_k$ when playing $k = C(t)$ - **Goal:** maximize your sum reward $\sum\limits_{t=1}^{T} r(t)$ - Hypothesis: rewards are stochastic, of mean $\mu_k$. Example: Bernoulli distributions. ## Why is it famous? Simple but good model for **exploration/exploitation** dilemma. --- # 3.2. Multi-Armed Bandits Algorithms ### Often "*index* based" - Keep *index* $U_k(t) \in \mathbb{R}$ for each arm $k=1,\ldots,K$ - Always use channel $C(t) = \arg\max U_k(t)$ - $U_k(t)$ should represent our belief of the *quality* of arm $k$ at time $t$ ### (:boom: unefficient) Example: "Follow the Leader" - $X_k(t) := \sum\limits_{s < t} r(s) \bold{1}(C(s)=k)$ sum reward from arm $k$ - $N_k(t) := \sum\limits_{s < t} \bold{1}(C(s)=k)$ number of samples of arm $k$ - And use $U_k(t) = \hat{\mu}_k(t) := \frac{X_k(t)}{N_k(t)}$. --- # *Upper Confidence Bounds* algorithm (UCB) - Instead of $U_k(t) = \hat{\mu}_k(t) = \frac{X_k(t)}{N_k(t)}$, :ok_hand: add an *exploration term* $$ U_k(t) = \mathrm{UCB}_k(t) = \hat{\mu}_k(t) + \sqrt{\alpha \frac{\log(t)}{N_k(t)}} $$ ### Parameter $\alpha =$ trade-off exploration *vs* exploitation - Small $\alpha \Longleftrightarrow$ focus more on **exploitation**, - Large $\alpha \Longleftrightarrow$ focus more on **exploration**, - Typically $\alpha=1$ works fine empirically and theoretically. --- # *Upper Confidence Bounds* algorithm (UCB) ![bg original 90%](plots/Algorithm1_UCB.png) --- # 4. We Study Different Heuristics (5) - They all use one UCB algorithm to decide the channel to use for first transmissions of any message - They use different approaches for retransmissions: - "==Only UCB==": use same $\mathrm{UCB}$ for retransmissions, - "==Random==": uniformly random retransmissions, - "==UCB==": use another $\mathrm{UCB}^r$ for retransmissions (no matter the channel for first transmission), - "==K-UCB==": use $K$ different $\mathrm{UCB}^j$ for retransmission after a first transmission on channel $j\in\{1,\cdots,K\}$, - "==Delayed UCB==": use another $\mathrm{UCB}^d$ for retransmissions, but launched after a delay $\Delta$. --- ## 4.1. Only UCB Use the same $\mathrm{UCB}$ to decide the channel to use for any transmissions, regardless if it's a first transmission or a retransmission of a message. ![70%](plots/Algorithm1_UCB.png) --- ## 4.2. UCB + ==random retransmissions== ![bg original 90%](plots/Algorithm2_UCB_RandomRetransmission.png) --- ## 4.3. UCB + ==one $\mathrm{UCB}^r$ for retransmissions== ![bg original 90%](plots/Algorithm3_UCB_UCBRetransmission.png) --- ## 4.4. UCB + ==$K$ $\neq$ $\mathrm{UCB}^j$ for retransmissions== ![bg original 85%](plots/Algorithm4_UCB_KUCBRetransmission.png) --- ## 4.5. UCB + ==Delayed $\mathrm{UCB}^d$ for retransmissions== ![bg original 85%](plots/Algorithm5_UCB_DelayedUCBRetransmission.png) --- # 5. Numerical simulations and results ### What - We simulate a network, with $K=4$ orthogonal channels, - With many IoT dynamic devices.
### Why ? - IoT devices implement the UCB learning algorithm to learn to optimize their *first* transmission of any uplink packets, - And the different heuristic to (try to) learn ==to optimize their *retransmissions*== of the packets after any collision. --- # 5.1. First experiment We consider an example network with... - $K=4$ channels (e.g., like in LoRa), - $M=5$ maximum number of retransmission, - $m=5$ maximum back-off interval, - $p=10^{-3}$ transmission probability, - $5=20 \times 10^4$ time slots, - for$N=1000$ IoT devices. ### Hypothesis :point_right: ==Non uniform occupancy of the $4$ channels:== they are occupied $10$, $30$, $30$ and $30\%$ of times (by other IoT networks). --- ![bg original 80%](plots/ResultsUCB.png) --- # 5.2. Second experiment - Same parameters
### Hypothesis :point_right: ==Non uniform occupancy of the $4$ channels:== they are occupied $40$, $30$, $20$ and $30\%$ of times (by other IoT networks). --- ![bg original 80%](plots/ResultsUCB2.png) --- # 6. Summary (1/3) ## Settings 1. For **IoT networks** based on a simple **ALOHA protocol** (slotted both in time and frequency), 2. We presented a **retransmission model**,
3. Dynamic **IoT devices** can use **simple machine learning algorithms**, to improve their successful communication rate, 4. We focus on the packet retransmissions upon radio collision, by using low-cost **Multi-Armed Bandit** algorithms, like **UCB**. --- # 6. Summary (2/3) ## We presented Several **learning heuristics** - that try to learn how to transmit and retransmit in a smarter way,
- by using the classical UCB algorithm for **channel selection for first transmission**: it has a **low memory and computation cost**, easy to add on an embedded CPU of an IoT device,
- and different ideas based on UCB for the retransmissions upon collisions, that add no cost/memory overhead. --- # 6. Summary (3/3) ## We showed - Using machine learning for the *transmission* is **needed** to achieve optimal performance, and can lead to significant gain in terms of successful transmission rates (up-to 30% in the example network). - Using machine learning for the *retransmission* is also useful, and improves over previous approach unaware of retransmission. - The proposed heuristics outperform a naive random access scheme. - :point_right: Surprisingly, the main take-away message is that ==a simple UCB learning approach, that retransmit in the same channel, turns out to perform as well as more complicated heuristics==. --- # More ?
### $\hookrightarrow$ See our paper: [`HAL.Inria.fr/hal-02049824`](https://hal.inria.fr/hal-02049824) :point_left:

### :pray: Please ask questions ! ### Or by email ==`Lilian.Besson @ CentraleSupelec.fr`== ?

Thanks for listening :+1: !