PoliciesMultiPlayers.rhoEst module¶
rhoEst: implementation of the 2nd multi-player policy from [Distributed Algorithms for Learning…, Anandkumar et al., 2010](http://ieeexplore.ieee.org/document/5462144/).
Each child player is selfish, and plays according to an index policy (any index policy, e.g., UCB, Thompson, KL-UCB, BayesUCB etc),
But instead of aiming at the best (the 1-st best) arm, player i aims at the rank_i-th best arm,
At first, every player has a random rank_i from 1 to M, and when a collision occurs, rank_i is sampled from a uniform distribution on [1,…,ˆMi(t)] where ˆMi(t) is the current estimated number of player by player i,
The procedure to estimate ˆMi(t) is not so simple, but basically everyone starts with ˆMi(0)=1, and when colliding ˆMi(t+1)=ˆMi(t)+1, for some time (with a complicated threshold).
My choice for the threshold function, see
threshold_on_t()
, does not need the horizon either, and uses t instead.
Note
This is fully decentralized: each child player does NOT need to know the number of players and does NOT require the horizon T.
Note
For a more generic approach, see the wrapper defined in EstimateM.EstimateM
.
-
class
PoliciesMultiPlayers.rhoEst.
oneRhoEst
(threshold, *args, **kwargs)[source]¶ Bases:
PoliciesMultiPlayers.rhoRand.oneRhoRand
Class that acts as a child policy, but in fact it pass all its method calls to the mother class, who passes it to its i-th player.
Except for the handleCollision method: a new random rank is sampled after observing a collision,
And the player does not aim at the best arm, but at the rank-th best arm, based on her index policy,
The rhoEst policy is used to keep an estimate on the total number of players, ˆMi(t).
The procedure to estimate ˆMi(t) is not so simple, but basically everyone starts with ˆMi(0)=1, and when colliding ˆMi(t+1)=ˆMi(t)+1, for some time (with a complicated threshold).
-
__init__
(threshold, *args, **kwargs)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
threshold
= None¶ Threshold function
-
nbPlayersEstimate
= None¶ Number of players. Optimistic: start by assuming it is alone!
-
rank
= None¶ Current rank, starting to 1
-
collisionCount
= None¶ Count collisions on each arm, since last increase of nbPlayersEstimate
-
timeSinceLastCollision
= None¶ Time since last collision. Don’t remember why I thought using this could be useful… But it’s not!
-
t
= None¶ Internal time
-
__module__
= 'PoliciesMultiPlayers.rhoEst'¶
-
class
PoliciesMultiPlayers.rhoEst.
rhoEst
(nbPlayers, nbArms, playerAlgo, threshold=<function threshold_on_t_doubling_trick>, lower=0.0, amplitude=1.0, *args, **kwargs)[source]¶ Bases:
PoliciesMultiPlayers.rhoRand.rhoRand
rhoEst: implementation of the 2nd multi-player policy from [Distributed Algorithms for Learning…, Anandkumar et al., 2010](http://ieeexplore.ieee.org/document/5462144/).
-
__init__
(nbPlayers, nbArms, playerAlgo, threshold=<function threshold_on_t_doubling_trick>, lower=0.0, amplitude=1.0, *args, **kwargs)[source]¶ nbPlayers: number of players to create (in self._players).
playerAlgo: class to use for every players.
nbArms: number of arms, given as first argument to playerAlgo.
threshold: the threshold function to use, see
EstimateM.threshold_on_t_with_horizon()
,EstimateM.threshold_on_t_doubling_trick()
orEstimateM.threshold_on_t()
above.*args, **kwargs: arguments, named arguments, given to playerAlgo.
Example:
>>> from Policies import * >>> import random; random.seed(0); import numpy as np; np.random.seed(0) >>> nbArms = 17 >>> nbPlayers = 6 >>> s = rhoEst(nbPlayers, nbArms, UCB, threshold=threshold_on_t) >>> [ child.choice() for child in s.children ] [12, 15, 0, 3, 3, 7] >>> [ child.choice() for child in s.children ] [9, 4, 6, 12, 1, 6]
To get a list of usable players, use
s.children
.Warning:
s._players
is for internal use ONLY!
-
nbPlayers
= None¶ Number of players
-
children
= None¶ List of children, fake algorithms
-
nbArms
= None¶ Number of arms
-
__module__
= 'PoliciesMultiPlayers.rhoEst'¶
-
-
class
PoliciesMultiPlayers.rhoEst.
rhoEstPlus
(nbPlayers, nbArms, playerAlgo, horizon, lower=0.0, amplitude=1.0, *args, **kwargs)[source]¶ Bases:
PoliciesMultiPlayers.rhoRand.rhoRand
rhoEstPlus: implementation of the 2nd multi-player policy from [Distributed Algorithms for Learning…, Anandkumar et al., 2010](http://ieeexplore.ieee.org/document/5462144/).
-
__init__
(nbPlayers, nbArms, playerAlgo, horizon, lower=0.0, amplitude=1.0, *args, **kwargs)[source]¶ nbPlayers: number of players to create (in self._players).
playerAlgo: class to use for every players.
nbArms: number of arms, given as first argument to playerAlgo.
horizon: need to know the horizon T.
*args, **kwargs: arguments, named arguments, given to playerAlgo.
Example:
>>> from Policies import * >>> import random; random.seed(0); import numpy as np; np.random.seed(0) >>> nbArms = 17 >>> nbPlayers = 6 >>> horizon = 1000 >>> s = rhoEstPlus(nbPlayers, nbArms, UCB, horizon=horizon) >>> [ child.choice() for child in s.children ] [12, 15, 0, 3, 3, 7] >>> [ child.choice() for child in s.children ] [9, 4, 6, 12, 1, 6]
To get a list of usable players, use
s.children
.Warning:
s._players
is for internal use ONLY!
-
nbPlayers
= None¶ Number of players
-
children
= None¶ List of children, fake algorithms
-
nbArms
= None¶ Number of arms
-
__module__
= 'PoliciesMultiPlayers.rhoEst'¶
-