Material Detail
Information Complexity in Bandit Subset Selection
This video was recorded at 26th Annual Conference on Learning Theory (COLT), Princeton 2013. We consider the problem of efficiently exploring the arms of a stochastic bandit to identify the best subset. Under the PAC and the fixed-budget formulations, we derive improved bounds by using KL-divergence-based confidence intervals. Whereas the application of a similar idea in the regret setting has yielded bounds in terms of the KL-divergence between the arms, our bounds in the pure-exploration setting involve the Chernoff information between the arms. In addition to introducing this novel quantity to the bandits literature, we contribute a comparison between the "racing" and "smart sampling" strategies for pure-exploration problems, finding strong evidence in favor of the latter.
Quality
- User Rating
- Comments
- Learning Exercises
- Bookmark Collections
- Course ePortfolios
- Accessibility Info