Tao of RWD Blog

Dynamic Treatment Regimes with RL, Part I

A Unified Framework for Sequential Clinical Decision-Making Using Causal Inference and Reinforcement Learning

Based on the book Optimal Control Using Causal Agents

Abstract

Dynamic treatment regimes (DTRs) describe sequenced treatment strategies that depend on individual features and adapt to patients' evolving conditions. In the first of two articles exploring DTRs, we bridge causal inference and reinforcement learning (RL), showing that a DTR is an RL policy and that finding the optimal DTR is equivalent to solving for the optimal policy. We introduce core RL concepts, including policies, value functions, and Q-learning, and map them to familiar causal and clinical language. Using a grid-based robot demonstration, we build intuition for backward induction and Q-function estimation from observational data. We conclude by identifying a critical limitation: model unreliability under distribution shift, motivating the pessimism principle developed in Part II.

About the Authors

MaryLena Bleile|Researcher specializing in causal inference and reinforcement learning. Her work on bridging these fields inspired our reformulation of the Causal Navigator to emphasize the foundational causal inference engine developed by Barenbboim and collaborators, a unified framework for understanding when and how causal effects can be identified from observational data.

Aimee Harrison|Aimee Harrison (BS, MFA) co-maintains Tao of RWD, and works as a product manager support real world evidence study design tools at Navidence.

Andy Wilson|Andy Wilson (PhD, MStat), Founder and Principal of The Tao of RWD and Adjunct Professor at the University of Utah School of Medicine. Andy bridges cutting-edge causal inference methodology with practical application in regulatory and healthcare settings. With over 100 peer-reviewed publications and a decade of experience in pharmacoepidemiology and real-world evidence, he focuses on helping organizations move beyond correlation to understand true cause-and-effect relationships. He currently teaches PBHLT 7115: Causal Methods in Public Health at the University of Utah and has presented alongside leaders at the FDA, EMA, and the American Causal Inference Conference.

Loading content...