Probability Seminar
I will present the compensated coupling: a simple paradigm for designing sequential decision-making policies based on sample-pathwise comparisons against a hindsight benchmark. This approach generalizes many standard results used in studying Markov decision processes and reinforcement learning, but also gives us new policies which are much simpler and more effective than existing heuristics. For a large class of widely-studied sequential decision-making problems -- including network revenue management, dynamic pricing, generalized assignment, online bin packing, and bandits with knapsacks -- I will illustrate how under a wide range of conditions, our approach achieves additive loss compared to the hindsight optimal which is independent of the horizon and state-space. Time permitting, I will try and describe how we can use this technique to incorporate side information and historical data, and achieve constant regret with as little as a single data trace.