# Slide Outline: Reinforcement Learning Workshop

## 1. Workshop Goal

- Turn one repeated decision system into an offline policy improvement plan.
- End with state, actions, reward, guardrails, support checks, OPE plan, and rollout.

## 2. Policy Learning Is Not Just Optimization

- Bad rewards create bad behavior.
- Unsupported actions create unsafe extrapolation.
- Offline estimates do not replace staged deployment.

## 3. Frame The Decision System

- State representation
- Action set
- Reward
- Guardrails
- Human fallback

## 4. Audit The Logs

- Behavior policy probabilities
- State-action support
- Missing outcomes
- Delayed rewards
- High-risk states

## 5. Team Exercise

- Choose a logged decision system.
- Identify unsupported actions.
- Write one guardrail that can block deployment.

## 6. Peer Critique

- Where could reward hacking appear?
- Which state variables leak future information?
- Which recommendations require human review?

## 7. Close

- Share the safest rollout step that still creates new evidence.
