Bandit Problems with Infinitely Many Arms

Authors: 
Donald A. Berry, Robert W. Chen, Alan Zame, David E. Heath, Larry A. Shepp
Duke University, University of Miami, Cornell University, AT&T Bell Laboratories

Nov 30 1994

We consider a bandit problem consisting of a sequence of n choices from an infinite number of Bernoulli arms, with n -> ∞. The objective is to minimize the long-run failure rate. The Bernoulli parameters are independent observations from a distribution F. We first assume F to be the uniform distribution on (0,1) and consider various extension. In the uniform case we show that the best lower bound for the expected failure proportion is between √(2)/√(n) and 2/√(n) and we exhibit classes of strategies that achieve the latter.

Keywords: 

Bandit problems, sequential experimentation, dynamic allocation of Bernoulli, staying with a winner, switching with a loser

Manuscript: 

PDF icon 1995-28.pdf