# Worked Example: Experiment Review Board

## Charter

Increase learning velocity while protecting validity, users, and business guardrails.

The board does not exist to approve every experiment or block every risky one. It exists to make experiment decisions reusable, auditable, and proportional to risk.

## Risk Tiers

Low:

- copy
- layout
- onboarding tips
- low-risk email subject lines

Review: automated checklist.

Medium:

- onboarding flow changes
- notifications
- recommendations
- search ranking changes

Review: analyst or experimentation-platform review.

High:

- pricing
- credit, employment, health, safety, or medical decisions
- materially personalized rankings
- policies affecting access or eligibility

Review: cross-functional review with product, analytics, legal/ethics where relevant, and operational owner.

## Intake Requirements

Required for every experiment:

- owner
- decision
- hypothesis
- population
- treatment
- control
- randomization unit
- primary metric
- guardrails
- expected runtime
- risk tier
- rollback owner

Reject experiments without an explicit decision.

## Automated Checks

Before human review:

- all required fields complete
- metric contract exists for primary metric
- guardrails selected
- randomization unit matches outcome unit
- MDE or power rationale present
- active experiment collision checked
- SRM monitor configured
- exposure logging plan complete

## Launch Readiness

Launch only after:

- assignment events arrive
- exposure events arrive
- metric dashboard populates
- SRM alerting works
- rollback path is tested
- owner is available during first exposure window

## Readout Order

1. Validity checks.
2. Primary metric.
3. Guardrails.
4. Secondary metrics.
5. Segment analysis.
6. Decision.
7. Archive.

If validity checks fail, the readout becomes an incident review rather than a shipping debate.

## Experiment Card

Archive every completed experiment with:

- hypothesis
- decision
- design
- population
- metrics
- diagnostics
- effect estimate
- decision
- follow-up
- reusable learning

## Keeping the Process Fast

Low-risk experiments should move through automated review. Medium-risk experiments should use a one-business-day service-level target. High-risk experiments should batch review twice per week with clear pre-read requirements.

The board should publish examples of approved, rejected, and revised experiments so teams learn the standard.

## Strongest Critique

The review board can become performative if it measures approvals instead of downstream learning quality. Track invalidated experiments caught before launch, repeated metric-definition issues, and how often archived results are reused.