Gated course
Reinforcement Learning
Build a foundation in RL from dynamic programming through deep policy gradient methods and RLHF.
AudienceML engineers, agent builders, data scientists, and product teams designing adaptive policies or logged-decision systems.
PrerequisitesBasic probability, supervised learning concepts, and comfort with state/action/reward notation.
Final artifactAn offline policy improvement plan with logging requirements, OPE strategy, support checks, and staged deployment criteria.
TimeAbout 5-6 hours