Considering the list of players :
[rhoRand(UCB($\alpha=1$)), rhoRand(UCB($\alpha=1$))]
Number of players in the multi-players game: 2
Time horizon: 10000
Number of repetitions: 100
Sampling rate for plotting, delta_t_plot: 1
Number of jobs for parallelization: 1
Using collision model onlyUniqUserGetsReward (function <function onlyUniqUserGetsReward at 0x7fd1f3a25bf8>).
More details:
Simple collision model where only the players alone on one arm samples it and receives the reward.
- This is the default collision model, cf. [[Multi-Player Bandits Revisited, Lilian Besson and Emilie Kaufmann, 2017]](https://hal.inria.fr/hal-01629733).
- The numpy array 'choices' is increased according to the number of users who collided (it is NOT binary).
Using accurate regrets and last regrets ? True
Creating a new MAB problem ...
Reading arms of this MAB problem from a dictionnary 'configuration' = {'arm_type': <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>, 'params': [0.3, 0.4, 0.5, 0.6, 0.7]} ...
- with 'arm_type' = <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>
- with 'params' = [0.3, 0.4, 0.5, 0.6, 0.7]
- with 'arms' = [B(0.3), B(0.4), B(0.5), B(0.6), B(0.7)]
- with 'means' = [0.3 0.4 0.5 0.6 0.7]
- with 'nbArms' = 5
- with 'maxArm' = 0.7
- with 'minArm' = 0.3
This MAB problem has:
- a [Lai & Robbins] complexity constant C(mu) = 9.46 ...
- a Optimal Arm Identification factor H_OI(mu) = 60.00% ...
- with 'arms' represented as: $[B(0.3), B(0.4), B(0.5), B(0.6), B(0.7)^*]$
Creating a new MAB problem ...
Reading arms of this MAB problem from a dictionnary 'configuration' = {'arm_type': <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>, 'params': [0.1, 0.3, 0.5, 0.7, 0.9]} ...
- with 'arm_type' = <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>
- with 'params' = [0.1, 0.3, 0.5, 0.7, 0.9]
- with 'arms' = [B(0.1), B(0.3), B(0.5), B(0.7), B(0.9)]
- with 'means' = [0.1 0.3 0.5 0.7 0.9]
- with 'nbArms' = 5
- with 'maxArm' = 0.9
- with 'minArm' = 0.1
This MAB problem has:
- a [Lai & Robbins] complexity constant C(mu) = 3.12 ...
- a Optimal Arm Identification factor H_OI(mu) = 40.00% ...
- with 'arms' represented as: $[B(0.1), B(0.3), B(0.5), B(0.7), B(0.9)^*]$
Creating a new MAB problem ...
Reading arms of this MAB problem from a dictionnary 'configuration' = {'arm_type': <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>, 'params': [0.005, 0.01, 0.015, 0.84, 0.85]} ...
- with 'arm_type' = <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>
- with 'params' = [0.005, 0.01, 0.015, 0.84, 0.85]
- with 'arms' = [B(0.005), B(0.01), B(0.015), B(0.84), B(0.85)]
- with 'means' = [0.005 0.01 0.015 0.84 0.85 ]
- with 'nbArms' = 5
- with 'maxArm' = 0.85
- with 'minArm' = 0.005
This MAB problem has:
- a [Lai & Robbins] complexity constant C(mu) = 27.3 ...
- a Optimal Arm Identification factor H_OI(mu) = 29.40% ...
- with 'arms' represented as: $[B(0.005), B(0.01), B(0.015), B(0.84), B(0.85)^*]$
Number of environments to try: 3
Considering the list of players :
[Selfish(UCB($\alpha=1$)), Selfish(UCB($\alpha=1$))]
Number of players in the multi-players game: 2
Time horizon: 10000
Number of repetitions: 100
Sampling rate for plotting, delta_t_plot: 1
Number of jobs for parallelization: 1
Using collision model onlyUniqUserGetsReward (function <function onlyUniqUserGetsReward at 0x7fd1f3a25bf8>).
More details:
Simple collision model where only the players alone on one arm samples it and receives the reward.
- This is the default collision model, cf. [[Multi-Player Bandits Revisited, Lilian Besson and Emilie Kaufmann, 2017]](https://hal.inria.fr/hal-01629733).
- The numpy array 'choices' is increased according to the number of users who collided (it is NOT binary).
Using accurate regrets and last regrets ? True
Creating a new MAB problem ...
Reading arms of this MAB problem from a dictionnary 'configuration' = {'arm_type': <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>, 'params': [0.3, 0.4, 0.5, 0.6, 0.7]} ...
- with 'arm_type' = <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>
- with 'params' = [0.3, 0.4, 0.5, 0.6, 0.7]
- with 'arms' = [B(0.3), B(0.4), B(0.5), B(0.6), B(0.7)]
- with 'means' = [0.3 0.4 0.5 0.6 0.7]
- with 'nbArms' = 5
- with 'maxArm' = 0.7
- with 'minArm' = 0.3
This MAB problem has:
- a [Lai & Robbins] complexity constant C(mu) = 9.46 ...
- a Optimal Arm Identification factor H_OI(mu) = 60.00% ...
- with 'arms' represented as: $[B(0.3), B(0.4), B(0.5), B(0.6), B(0.7)^*]$
Creating a new MAB problem ...
Reading arms of this MAB problem from a dictionnary 'configuration' = {'arm_type': <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>, 'params': [0.1, 0.3, 0.5, 0.7, 0.9]} ...
- with 'arm_type' = <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>
- with 'params' = [0.1, 0.3, 0.5, 0.7, 0.9]
- with 'arms' = [B(0.1), B(0.3), B(0.5), B(0.7), B(0.9)]
- with 'means' = [0.1 0.3 0.5 0.7 0.9]
- with 'nbArms' = 5
- with 'maxArm' = 0.9
- with 'minArm' = 0.1
This MAB problem has:
- a [Lai & Robbins] complexity constant C(mu) = 3.12 ...
- a Optimal Arm Identification factor H_OI(mu) = 40.00% ...
- with 'arms' represented as: $[B(0.1), B(0.3), B(0.5), B(0.7), B(0.9)^*]$
Creating a new MAB problem ...
Reading arms of this MAB problem from a dictionnary 'configuration' = {'arm_type': <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>, 'params': [0.005, 0.01, 0.015, 0.84, 0.85]} ...
- with 'arm_type' = <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>
- with 'params' = [0.005, 0.01, 0.015, 0.84, 0.85]
- with 'arms' = [B(0.005), B(0.01), B(0.015), B(0.84), B(0.85)]
- with 'means' = [0.005 0.01 0.015 0.84 0.85 ]
- with 'nbArms' = 5
- with 'maxArm' = 0.85
- with 'minArm' = 0.005
This MAB problem has:
- a [Lai & Robbins] complexity constant C(mu) = 27.3 ...
- a Optimal Arm Identification factor H_OI(mu) = 29.40% ...
- with 'arms' represented as: $[B(0.005), B(0.01), B(0.015), B(0.84), B(0.85)^*]$
Number of environments to try: 3
Considering the list of players :
[rhoRand(Thompson), rhoRand(Thompson)]
Number of players in the multi-players game: 2
Time horizon: 10000
Number of repetitions: 100
Sampling rate for plotting, delta_t_plot: 1
Number of jobs for parallelization: 1
Using collision model onlyUniqUserGetsReward (function <function onlyUniqUserGetsReward at 0x7fd1f3a25bf8>).
More details:
Simple collision model where only the players alone on one arm samples it and receives the reward.
- This is the default collision model, cf. [[Multi-Player Bandits Revisited, Lilian Besson and Emilie Kaufmann, 2017]](https://hal.inria.fr/hal-01629733).
- The numpy array 'choices' is increased according to the number of users who collided (it is NOT binary).
Using accurate regrets and last regrets ? True
Creating a new MAB problem ...
Reading arms of this MAB problem from a dictionnary 'configuration' = {'arm_type': <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>, 'params': [0.3, 0.4, 0.5, 0.6, 0.7]} ...
- with 'arm_type' = <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>
- with 'params' = [0.3, 0.4, 0.5, 0.6, 0.7]
- with 'arms' = [B(0.3), B(0.4), B(0.5), B(0.6), B(0.7)]
- with 'means' = [0.3 0.4 0.5 0.6 0.7]
- with 'nbArms' = 5
- with 'maxArm' = 0.7
- with 'minArm' = 0.3
This MAB problem has:
- a [Lai & Robbins] complexity constant C(mu) = 9.46 ...
- a Optimal Arm Identification factor H_OI(mu) = 60.00% ...
- with 'arms' represented as: $[B(0.3), B(0.4), B(0.5), B(0.6), B(0.7)^*]$
Creating a new MAB problem ...
Reading arms of this MAB problem from a dictionnary 'configuration' = {'arm_type': <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>, 'params': [0.1, 0.3, 0.5, 0.7, 0.9]} ...
- with 'arm_type' = <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>
- with 'params' = [0.1, 0.3, 0.5, 0.7, 0.9]
- with 'arms' = [B(0.1), B(0.3), B(0.5), B(0.7), B(0.9)]
- with 'means' = [0.1 0.3 0.5 0.7 0.9]
- with 'nbArms' = 5
- with 'maxArm' = 0.9
- with 'minArm' = 0.1
This MAB problem has:
- a [Lai & Robbins] complexity constant C(mu) = 3.12 ...
- a Optimal Arm Identification factor H_OI(mu) = 40.00% ...
- with 'arms' represented as: $[B(0.1), B(0.3), B(0.5), B(0.7), B(0.9)^*]$
Creating a new MAB problem ...
Reading arms of this MAB problem from a dictionnary 'configuration' = {'arm_type': <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>, 'params': [0.005, 0.01, 0.015, 0.84, 0.85]} ...
- with 'arm_type' = <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>
- with 'params' = [0.005, 0.01, 0.015, 0.84, 0.85]
- with 'arms' = [B(0.005), B(0.01), B(0.015), B(0.84), B(0.85)]
- with 'means' = [0.005 0.01 0.015 0.84 0.85 ]
- with 'nbArms' = 5
- with 'maxArm' = 0.85
- with 'minArm' = 0.005
This MAB problem has:
- a [Lai & Robbins] complexity constant C(mu) = 27.3 ...
- a Optimal Arm Identification factor H_OI(mu) = 29.40% ...
- with 'arms' represented as: $[B(0.005), B(0.01), B(0.015), B(0.84), B(0.85)^*]$
Number of environments to try: 3
Considering the list of players :
[Selfish(Thompson), Selfish(Thompson)]
Number of players in the multi-players game: 2
Time horizon: 10000
Number of repetitions: 100
Sampling rate for plotting, delta_t_plot: 1
Number of jobs for parallelization: 1
Using collision model onlyUniqUserGetsReward (function <function onlyUniqUserGetsReward at 0x7fd1f3a25bf8>).
More details:
Simple collision model where only the players alone on one arm samples it and receives the reward.
- This is the default collision model, cf. [[Multi-Player Bandits Revisited, Lilian Besson and Emilie Kaufmann, 2017]](https://hal.inria.fr/hal-01629733).
- The numpy array 'choices' is increased according to the number of users who collided (it is NOT binary).
Using accurate regrets and last regrets ? True
Creating a new MAB problem ...
Reading arms of this MAB problem from a dictionnary 'configuration' = {'arm_type': <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>, 'params': [0.3, 0.4, 0.5, 0.6, 0.7]} ...
- with 'arm_type' = <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>
- with 'params' = [0.3, 0.4, 0.5, 0.6, 0.7]
- with 'arms' = [B(0.3), B(0.4), B(0.5), B(0.6), B(0.7)]
- with 'means' = [0.3 0.4 0.5 0.6 0.7]
- with 'nbArms' = 5
- with 'maxArm' = 0.7
- with 'minArm' = 0.3
This MAB problem has:
- a [Lai & Robbins] complexity constant C(mu) = 9.46 ...
- a Optimal Arm Identification factor H_OI(mu) = 60.00% ...
- with 'arms' represented as: $[B(0.3), B(0.4), B(0.5), B(0.6), B(0.7)^*]$
Creating a new MAB problem ...
Reading arms of this MAB problem from a dictionnary 'configuration' = {'arm_type': <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>, 'params': [0.1, 0.3, 0.5, 0.7, 0.9]} ...
- with 'arm_type' = <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>
- with 'params' = [0.1, 0.3, 0.5, 0.7, 0.9]
- with 'arms' = [B(0.1), B(0.3), B(0.5), B(0.7), B(0.9)]
- with 'means' = [0.1 0.3 0.5 0.7 0.9]
- with 'nbArms' = 5
- with 'maxArm' = 0.9
- with 'minArm' = 0.1
This MAB problem has:
- a [Lai & Robbins] complexity constant C(mu) = 3.12 ...
- a Optimal Arm Identification factor H_OI(mu) = 40.00% ...
- with 'arms' represented as: $[B(0.1), B(0.3), B(0.5), B(0.7), B(0.9)^*]$
Creating a new MAB problem ...
Reading arms of this MAB problem from a dictionnary 'configuration' = {'arm_type': <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>, 'params': [0.005, 0.01, 0.015, 0.84, 0.85]} ...
- with 'arm_type' = <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>
- with 'params' = [0.005, 0.01, 0.015, 0.84, 0.85]
- with 'arms' = [B(0.005), B(0.01), B(0.015), B(0.84), B(0.85)]
- with 'means' = [0.005 0.01 0.015 0.84 0.85 ]
- with 'nbArms' = 5
- with 'maxArm' = 0.85
- with 'minArm' = 0.005
This MAB problem has:
- a [Lai & Robbins] complexity constant C(mu) = 27.3 ...
- a Optimal Arm Identification factor H_OI(mu) = 29.40% ...
- with 'arms' represented as: $[B(0.005), B(0.01), B(0.015), B(0.84), B(0.85)^*]$
Number of environments to try: 3
Considering the list of players :
[rhoRand(kl-UCB), rhoRand(kl-UCB)]
Number of players in the multi-players game: 2
Time horizon: 10000
Number of repetitions: 100
Sampling rate for plotting, delta_t_plot: 1
Number of jobs for parallelization: 1
Using collision model onlyUniqUserGetsReward (function <function onlyUniqUserGetsReward at 0x7fd1f3a25bf8>).
More details:
Simple collision model where only the players alone on one arm samples it and receives the reward.
- This is the default collision model, cf. [[Multi-Player Bandits Revisited, Lilian Besson and Emilie Kaufmann, 2017]](https://hal.inria.fr/hal-01629733).
- The numpy array 'choices' is increased according to the number of users who collided (it is NOT binary).
Using accurate regrets and last regrets ? True
Creating a new MAB problem ...
Reading arms of this MAB problem from a dictionnary 'configuration' = {'arm_type': <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>, 'params': [0.3, 0.4, 0.5, 0.6, 0.7]} ...
- with 'arm_type' = <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>
- with 'params' = [0.3, 0.4, 0.5, 0.6, 0.7]
- with 'arms' = [B(0.3), B(0.4), B(0.5), B(0.6), B(0.7)]
- with 'means' = [0.3 0.4 0.5 0.6 0.7]
- with 'nbArms' = 5
- with 'maxArm' = 0.7
- with 'minArm' = 0.3
This MAB problem has:
- a [Lai & Robbins] complexity constant C(mu) = 9.46 ...
- a Optimal Arm Identification factor H_OI(mu) = 60.00% ...
- with 'arms' represented as: $[B(0.3), B(0.4), B(0.5), B(0.6), B(0.7)^*]$
Creating a new MAB problem ...
Reading arms of this MAB problem from a dictionnary 'configuration' = {'arm_type': <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>, 'params': [0.1, 0.3, 0.5, 0.7, 0.9]} ...
- with 'arm_type' = <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>
- with 'params' = [0.1, 0.3, 0.5, 0.7, 0.9]
- with 'arms' = [B(0.1), B(0.3), B(0.5), B(0.7), B(0.9)]
- with 'means' = [0.1 0.3 0.5 0.7 0.9]
- with 'nbArms' = 5
- with 'maxArm' = 0.9
- with 'minArm' = 0.1
This MAB problem has:
- a [Lai & Robbins] complexity constant C(mu) = 3.12 ...
- a Optimal Arm Identification factor H_OI(mu) = 40.00% ...
- with 'arms' represented as: $[B(0.1), B(0.3), B(0.5), B(0.7), B(0.9)^*]$
Creating a new MAB problem ...
Reading arms of this MAB problem from a dictionnary 'configuration' = {'arm_type': <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>, 'params': [0.005, 0.01, 0.015, 0.84, 0.85]} ...
- with 'arm_type' = <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>
- with 'params' = [0.005, 0.01, 0.015, 0.84, 0.85]
- with 'arms' = [B(0.005), B(0.01), B(0.015), B(0.84), B(0.85)]
- with 'means' = [0.005 0.01 0.015 0.84 0.85 ]
- with 'nbArms' = 5
- with 'maxArm' = 0.85
- with 'minArm' = 0.005
This MAB problem has:
- a [Lai & Robbins] complexity constant C(mu) = 27.3 ...
- a Optimal Arm Identification factor H_OI(mu) = 29.40% ...
- with 'arms' represented as: $[B(0.005), B(0.01), B(0.015), B(0.84), B(0.85)^*]$
Number of environments to try: 3
Considering the list of players :
[Selfish(kl-UCB), Selfish(kl-UCB)]
Number of players in the multi-players game: 2
Time horizon: 10000
Number of repetitions: 100
Sampling rate for plotting, delta_t_plot: 1
Number of jobs for parallelization: 1
Using collision model onlyUniqUserGetsReward (function <function onlyUniqUserGetsReward at 0x7fd1f3a25bf8>).
More details:
Simple collision model where only the players alone on one arm samples it and receives the reward.
- This is the default collision model, cf. [[Multi-Player Bandits Revisited, Lilian Besson and Emilie Kaufmann, 2017]](https://hal.inria.fr/hal-01629733).
- The numpy array 'choices' is increased according to the number of users who collided (it is NOT binary).
Using accurate regrets and last regrets ? True
Creating a new MAB problem ...
Reading arms of this MAB problem from a dictionnary 'configuration' = {'arm_type': <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>, 'params': [0.3, 0.4, 0.5, 0.6, 0.7]} ...
- with 'arm_type' = <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>
- with 'params' = [0.3, 0.4, 0.5, 0.6, 0.7]
- with 'arms' = [B(0.3), B(0.4), B(0.5), B(0.6), B(0.7)]
- with 'means' = [0.3 0.4 0.5 0.6 0.7]
- with 'nbArms' = 5
- with 'maxArm' = 0.7
- with 'minArm' = 0.3
This MAB problem has:
- a [Lai & Robbins] complexity constant C(mu) = 9.46 ...
- a Optimal Arm Identification factor H_OI(mu) = 60.00% ...
- with 'arms' represented as: $[B(0.3), B(0.4), B(0.5), B(0.6), B(0.7)^*]$
Creating a new MAB problem ...
Reading arms of this MAB problem from a dictionnary 'configuration' = {'arm_type': <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>, 'params': [0.1, 0.3, 0.5, 0.7, 0.9]} ...
- with 'arm_type' = <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>
- with 'params' = [0.1, 0.3, 0.5, 0.7, 0.9]
- with 'arms' = [B(0.1), B(0.3), B(0.5), B(0.7), B(0.9)]
- with 'means' = [0.1 0.3 0.5 0.7 0.9]
- with 'nbArms' = 5
- with 'maxArm' = 0.9
- with 'minArm' = 0.1
This MAB problem has:
- a [Lai & Robbins] complexity constant C(mu) = 3.12 ...
- a Optimal Arm Identification factor H_OI(mu) = 40.00% ...
- with 'arms' represented as: $[B(0.1), B(0.3), B(0.5), B(0.7), B(0.9)^*]$
Creating a new MAB problem ...
Reading arms of this MAB problem from a dictionnary 'configuration' = {'arm_type': <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>, 'params': [0.005, 0.01, 0.015, 0.84, 0.85]} ...
- with 'arm_type' = <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>
- with 'params' = [0.005, 0.01, 0.015, 0.84, 0.85]
- with 'arms' = [B(0.005), B(0.01), B(0.015), B(0.84), B(0.85)]
- with 'means' = [0.005 0.01 0.015 0.84 0.85 ]
- with 'nbArms' = 5
- with 'maxArm' = 0.85
- with 'minArm' = 0.005
This MAB problem has:
- a [Lai & Robbins] complexity constant C(mu) = 27.3 ...
- a Optimal Arm Identification factor H_OI(mu) = 29.40% ...
- with 'arms' represented as: $[B(0.005), B(0.01), B(0.015), B(0.84), B(0.85)^*]$
Number of environments to try: 3
CPU times: user 44.1 ms, sys: 8.09 ms, total: 52.2 ms
Wall time: 51.1 ms