# Worked Example: Support Triage Policy

## Decision System

Route incoming support tickets to self-serve article, junior agent, senior agent, or account manager.

## State

Available at decision time:

- urgency
- customer tier
- ticket topic
- queue depth
- customer tenure
- recent incident status

Excluded because they arrive later:

- final resolution time
- CSAT
- escalation outcome
- refund request

Eligibility:

- security, data-loss, and active-outage tickets cannot be routed to self-serve
- enterprise billing disputes are eligible for account manager routing

## Actions

- self_serve_article
- junior_agent
- senior_agent
- account_manager

Actions requiring human review:

- account_manager for non-enterprise users
- self_serve_article for medium urgency

Actions never automated:

- self_serve_article for high urgency security or data-loss tickets

## Reward and Guardrails

Primary reward:

- resolved within 24 hours

Delayed outcomes:

- CSAT
- repeat contact within 7 days
- refund request

Guardrails:

- escalation rate must not exceed baseline by more than 2 percentage points
- CSAT must not decline for enterprise customers
- high-urgency tickets must not wait longer than baseline p95
- protected customer segments must not receive systematically lower-touch routing

## Logged Data Requirements

Required columns:

- state at routing time
- full available action set
- chosen action
- behavior policy probability
- resolution outcome
- delayed CSAT
- escalation
- human override
- queue constraints

If behavior probabilities are unavailable, offline evaluation must be treated as weaker and more model-dependent.

## Evaluation

Support checks:

- candidate policy action distribution by urgency and tier
- minimum behavior probability for chosen candidate actions
- unsupported state-action pairs routed to fallback policy

Primary OPE:

- doubly robust estimate of 24-hour resolution

Secondary checks:

- IPS estimate with clipped weights
- direct reward model calibration
- guardrail outcomes by tier and urgency
- negative-control outcome: ticket topic distribution should not appear to improve

## Deployment

1. Shadow mode for two weeks.
2. Human review of policy recommendations for high-urgency tickets.
3. Canary on 5% of low- and medium-risk tickets.
4. Randomized comparison against current routing policy.
5. Ramp only if resolution improves and guardrails remain stable.

## Strongest Critique

The policy may learn historical staffing biases. If senior agents historically handled enterprise tickets, the model may conclude senior routing creates better outcomes when the customer tier itself explains much of the difference. Conservative constraints and randomized validation are required before broad rollout.
