Material Detail

Thompson Sampling for the Duelling Bandits Problem

Thompson Sampling for the Duelling Bandits Problem

This video was recorded at Large-scale Online Learning and Decision Making (LSOLDM) Workshop, Cumberland Lodge 2012. In surprisingly many situations, absolute rewards are not available (or nonstationary) while relative preferences are easy to collect (or stable). This variation of the bandit problem is known at the duelling bandits (or, dueling bandits in the US; see My talk will cover our preliminary work developing a Thompson sam piing algorithm for the duelling (or dueling) bandit problem.


  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material


Log in to participate in the discussions or sign up if you are not already a MERLOT member.