We control some communicating devices, they want to access to a single base station.
This is based on our latest article:
With sensing:
Device first senses for presence of Primary Users (background traffic), then use Ack
to detect collisions.
Model the "classical" Opportunistic Spectrum Access problem.
Not exactly suited for Internet of Things, but can model ZigBee, and can be analyzed mathematically...
Without sensing: same background traffic, but cannot sense, so only Ack
is used.
More suited for "IoT" networks like LoRa or SigFox.
(Harder to analyze mathematically.)
r^j(t) := Y_{A^j(t),t} × 1(not C^j(t)) = 1(uplink & Ack).
r^j(t)} := Y_{A^j(t),t} × 1(not C^j(t))
"Full feedback": observe both Y_{A^j(t),t} and C^j(t) separately, → Not realistic enough, we don't focus on it.
"Sensing": first observe Y{A^j(t),t}, then C^j(t) only if Y{A^j(t),t} != 0, → Models licensed protocols (ex. ZigBee), our main focus.
"No sensing": observe only the joint Y_{A^j(t),t} × 1(not C^j(t)), → Unlicensed protocols (ex. LoRaWAN), harder to analyze !
But all consider the same instantaneous reward r^j(t).
Ack
)
in a finite-space discrete-time Decision Making Problem.We study the centralized (expected) regret:
RT(µ, M, rho) := Eµ[ sum{t=1}^T sum{j=1}^M µj^* - r^j(t)] = (sum{k=1}^{M}µk^*) T - Eµ[sum{t=1}^T sum{j=1}^M r^j(t)]
For any algorithm, decentralized or not, we have
RT(µ, M, rho) = sum{k in M-worst} (µM^* - µ_k) Eµ[Tk(T)] + sum{k in M-best} (µk - µ_M^*) (T - Eµ[Tk(T)]) + sum{k=1}^{K} µk Eµ[C_k(T)].
For any uniformly efficient decentralized policy, and any non-degenerated problem µ,
lim inf (T → oo) RT(µ, M, rho) / log(T) >= M × ( sum{k in M-worst} (µ_M^ - µ_k) / kl(µ_k, µ_M^) ) .
Where kl(x,y) := x log(x / y) + (1 - x) log(1-x / 1-y) is the binary Kullback-Leibler divergence.
→ See our paper for details!
The device keep t number of sent packets, T_k(t) selections of channel k, X_k(t) successful transmissions in channel k.
References: [Lai & Robbins, 1985], [Auer et al, 2002], [Bubeck & Cesa-Bianchi, 2012]
The device keep t number of sent packets, T_k(t) selections of channel k, X_k(t) successful transmissions in channel k.
Why bother? klUCB is proved to be more efficient than UCB, and asymptotically optimal for single-player stochastic bandit.
References: [Garivier & Cappé, 2011], [Cappé & Garivier & Maillard & Munos & Stoltz, 2013]
If all M players use MCTopM with klUCB, then for any non-degenerated problem µ, there exists a problem dependent constant G_{M,µ} , such that the regret satisfies:
RT(µ, M, rho) \leq G{M,µ} log(T) + \smallO{\log T}.
→ See our paper for details!
Experiments on Bernoulli problems µ in [0,1]^K.
Regret, M=9 players, K=9 arms, horizon T=10000, 200 repetitions. Only RandTopM and MCTopM achieve constant regret in this saturated case (proved).
Regret, M=6 players, K=9 arms, horizon T=5000, against 500 problems µ uniformly sampled in [0,1]^K. Conclusion : rhoRand < RandTopM < Selfish < MCTopM in most cases.
Cumulated number of collisions. Also rhoRand < RandTopM < Selfish < MCTopM in most cases.
Cumulated number of arm switches. Again rhoRand < RandTopM < Selfish < MCTopM, but no guarantee for rhoRand.
Measure of fairness among player. All 4 algorithms seem fair in average, but none is fair on a single run It's quite hard to achieve both efficiency and single-run fairness!
For the harder feedback model, without sensing.
The Selfish decentralized approach = device don't use sensing, just learn on the reward (acknowledgement or not, r^j(t)).
Reference: [Bonnefoi & Besson et al, 2017]
Non-stationarity of background traffic etc
More realistic emission model: maybe driven by number of packets in a whole day, instead of emission probability.
Any question or idea ?
This is based on our latest article:
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |