configuration module¶
Configuration for the simulations, for the single-player case.
- 
configuration.CPU_COUNT= 4¶
- Number of CPU on the local machine 
- 
configuration.HORIZON= 10000¶
- HORIZON : number of time steps of the experiments. Warning Should be >= 10000 to be interesting “asymptotically”. 
- 
configuration.DO_PARALLEL= True¶
- To profile the code, turn down parallel computing 
- 
configuration.N_JOBS= -1¶
- Number of jobs to use for the parallel computations. -1 means all the CPU cores, 1 means no parallelization. 
- 
configuration.REPETITIONS= 4¶
- REPETITIONS : number of repetitions of the experiments. Warning: Should be >= 10 to be statistically trustworthy. 
- 
configuration.RANDOM_SHUFFLE= False¶
- The arms won’t be shuffled ( - shuffle(arms)).
- 
configuration.RANDOM_INVERT= False¶
- The arms won’t be inverted ( - arms = arms[::-1]).
- 
configuration.NB_BREAK_POINTS= 0¶
- Number of true breakpoints. They are uniformly spaced in time steps (and the first one at t=0 does not count). 
- 
configuration.EPSILON= 0.1¶
- Parameters for the epsilon-greedy and epsilon-… policies. 
- 
configuration.TEMPERATURE= 0.05¶
- Temperature for the Softmax policies. 
- 
configuration.LEARNING_RATE= 0.01¶
- Learning rate for my aggregated bandit (it can be autotuned) 
- 
configuration.TEST_WrapRange= False¶
- To know if my WrapRange policy is tested. 
- 
configuration.CACHE_REWARDS= True¶
- Should we cache rewards? The random rewards will be the same for all the REPETITIONS simulations for each algorithms. 
- 
configuration.UPDATE_ALL_CHILDREN= False¶
- Should the Aggregator policy update the trusts in each child or just the one trusted for last decision? 
- 
configuration.UNBIASED= False¶
- Should the rewards for Aggregator policy use as biased estimator, ie just - r_t, or unbiased estimators,- r_t / p_t
- 
configuration.UPDATE_LIKE_EXP4= False¶
- Should we update the trusts proba like in Exp4 or like in my initial Aggregator proposal 
- 
configuration.UNBOUNDED_VARIANCE= 1¶
- Variance of unbounded Gaussian arms 
- 
configuration.NB_ARMS= 9¶
- Number of arms for non-hard-coded problems (Bayesian problems) 
- 
configuration.LOWER= 0.0¶
- Default value for the lower value of means 
- 
configuration.AMPLITUDE= 1.0¶
- Default value for the amplitude value of means 
- 
configuration.VARIANCE= 0.05¶
- Variance of Gaussian arms 
- 
configuration.ARM_TYPE¶
- alias of - Arms.Bernoulli.Bernoulli
- 
configuration.ENVIRONMENT_BAYESIAN= False¶
- True to use bayesian problem 
- 
configuration.MEANS= [0.05, 0.16249999999999998, 0.27499999999999997, 0.38749999999999996, 0.49999999999999994, 0.6125, 0.725, 0.8374999999999999, 0.95]¶
- Means of arms for non-hard-coded problems (non Bayesian) 
- 
configuration.USE_FULL_RESTART= True¶
- True to use full-restart Doubling Trick 
- 
configuration.configuration= {'append_labels': {}, 'cache_rewards': True, 'change_labels': {0: 'Pure exploration', 1: 'Pure exploitation', 2: '$\\varepsilon$-greedy', 3: 'Explore-then-Exploit', 5: 'Bernoulli kl-UCB', 6: 'Thompson sampling'}, 'environment': [{'arm_type': <class 'Arms.Bernoulli.Bernoulli'>, 'params': [0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6, 0.7000000000000001, 0.8, 0.9]}], 'environment_bayesian': False, 'horizon': 10000, 'n_jobs': -1, 'nb_break_points': 0, 'plot_lowerbound': True, 'policies': [{'archtype': <class 'Policies.Uniform.Uniform'>, 'params': {}, 'change_label': 'Pure exploration'}, {'archtype': <class 'Policies.EmpiricalMeans.EmpiricalMeans'>, 'params': {}, 'change_label': 'Pure exploitation'}, {'archtype': <class 'Policies.EpsilonGreedy.EpsilonDecreasing'>, 'params': {'epsilon': 479.99999999999983}, 'change_label': '$\\varepsilon$-greedy'}, {'archtype': <class 'Policies.ExploreThenCommit.ETC_KnownGap'>, 'params': {'horizon': 10000, 'gap': 0.11250000000000004}, 'change_label': 'Explore-then-Exploit'}, {'archtype': <class 'Policies.UCBalpha.UCBalpha'>, 'params': {'alpha': 1}}, {'archtype': <class 'Policies.klUCB.klUCB'>, 'params': {'klucb': CPUDispatcher(<function klucbBern>)}, 'change_label': 'Bernoulli kl-UCB'}, {'archtype': <class 'Policies.Thompson.Thompson'>, 'params': {'posterior': <class 'Policies.Posterior.Beta.Beta'>}, 'change_label': 'Thompson sampling'}], 'random_invert': False, 'random_shuffle': False, 'repetitions': 4, 'verbosity': 6}¶
- This dictionary configures the experiments 
- 
configuration.nbArms= 9¶
- Number of arms in the first environment 
- 
configuration.klucb¶
- Warning: if using Exponential or Gaussian arms, gives klExp or klGauss to KL-UCB-like policies!