Yunzhe (Jeff) Zhou, Aimee Harrison, Andy Wilson
An introduction to the principle from offline reinforcement learning and how it can be applied to dynamic treatment regime estimation to produce treatment recommendations that account for model uncertainty using a Bayesian learning approach.