First, be sure to be in the main folder, or to have SMPyBandits installed, and import Evaluator from Environment package:
!pip install SMPyBandits watermark
%load_ext watermark
%watermark -v -m -p SMPyBandits -a "Lilian Besson"
# Local imports
from SMPyBandits.Environment import Evaluator, tqdm
We also need arms, for instance Bernoulli-distributed arm:
# Import arms
from SMPyBandits.Arms import Bernoulli
And finally we need some single-player Reinforcement Learning algorithms:
# Import algorithms
from SMPyBandits.Policies import *
For instance, this imported the UCB algorithm is the UCBalpha class:
# Just improving the ?? in Jupyter. Thanks to https://nbviewer.jupyter.org/gist/minrk/7715212
from __future__ import print_function
from IPython.core import page
def myprint(s):
    try:
        print(s['text/plain'])
    except (KeyError, TypeError):
        print(s)
page.page = myprint
UCBalpha?
With more details, here the code:
UCBalpha??
N_JOBS = 4 is the number of cores used to parallelize the code.HORIZON = 10000
REPETITIONS = 10
N_JOBS = 1
We consider in this example $3$ problems, with Bernoulli arms, of different means.
ENVIRONMENTS = [  # 1)  Bernoulli arms
        {   # A very easy problem, but it is used in a lot of articles
            "arm_type": Bernoulli,
            "params": [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
        },
        {   # An other problem, best arm = last, with three groups: very bad arms (0.01, 0.02), middle arms (0.3 - 0.6) and very good arms (0.78, 0.8, 0.82)
            "arm_type": Bernoulli,
            "params": [0.01, 0.02, 0.3, 0.4, 0.5, 0.6, 0.795, 0.8, 0.805]
        },
        {   # A very hard problem, as used in [Cappé et al, 2012]
            "arm_type": Bernoulli,
            "params": [0.01, 0.01, 0.01, 0.02, 0.02, 0.02, 0.05, 0.05, 0.1]
        },
    ]
We compare Thompson Sampling against $\mathrm{UCB}_1$, and $\mathrm{kl}-\mathrm{UCB}$.
POLICIES = [
        # --- UCB1 algorithm
        {
            "archtype": UCBalpha,
            "params": {
                "alpha": 1
            }
        },
        {
            "archtype": UCBalpha,
            "params": {
                "alpha": 0.5  # Smallest theoretically acceptable value
            }
        },
        # --- Thompson algorithm
        {
            "archtype": Thompson,
            "params": {}
        },
        # --- KL algorithms, here only klUCB
        {
            "archtype": klUCB,
            "params": {}
        },
        # --- BayesUCB algorithm
        {
            "archtype": BayesUCB,
            "params": {}
        },
    ]
Complete configuration for the problem:
configuration = {
    # --- Duration of the experiment
    "horizon": HORIZON,
    # --- Number of repetition of the experiment (to have an average)
    "repetitions": REPETITIONS,
    # --- Parameters for the use of joblib.Parallel
    "n_jobs": N_JOBS,    # = nb of CPU cores
    "verbosity": 6,      # Max joblib verbosity
    # --- Arms
    "environment": ENVIRONMENTS,
    # --- Algorithms
    "policies": POLICIES,
}
configuration
Evaluator object¶evaluation = Evaluator(configuration)
Now we can simulate all the $3$ environments. That part can take some time.
for envId, env in tqdm(enumerate(evaluation.envs), desc="Problems"):
    # Evaluate just that env
    evaluation.startOneEnv(envId, env)
And finally, visualize them, with the plotting method of a Evaluator object:
def plotAll(evaluation, envId):
    evaluation.printFinalRanking(envId)
    evaluation.plotRegrets(envId)
    evaluation.plotRegrets(envId, semilogx=True)
    evaluation.plotRegrets(envId, meanReward=True)
    evaluation.plotBestArmPulls(envId)
evaluation.nb_break_points
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = (12.4, 7)
$[B(0.1), B(0.2), B(0.3), B(0.4), B(0.5), B(0.6), B(0.7), B(0.8), B(0.9)]$
_ = plotAll(evaluation, 0)
$[B(0.01), B(0.02), B(0.3), B(0.4), B(0.5), B(0.6), B(0.795), B(0.8), B(0.805)]$
plotAll(evaluation, 1)
$[B(0.01), B(0.01), B(0.01), B(0.02), B(0.02), B(0.02), B(0.05), B(0.05), B(0.1)]$
plotAll(evaluation, 2)
That's it for this demo!