Bandit Problems with Infinitely Many Arms
Nov 30 1994
We consider a bandit problem consisting of a sequence of n choices from an infinite number of Bernoulli arms, with n -> ∞. The objective is to minimize the long-run failure rate. The Bernoulli parameters are independent observations from a distribution F. We first assume F to be the uniform distribution on (0,1) and consider various extension. In the uniform case we show that the best lower bound for the expected failure proportion is between √(2)/√(n) and 2/√(n) and we exhibit classes of strategies that achieve the latter.
Keywords:Bandit problems, sequential experimentation, dynamic allocation of Bernoulli, staying with a winner, switching with a loser