# Workshop Guide: A/B Testing at Scale

## Audience

Experiment platform teams, PMs, analysts, growth leaders, and organizations running many tests at once.

## 60-Minute Agenda

1. 0-10 min: Choose an organization or team running multiple experiments.
2. 10-20 min: Define risk tiers.
3. 20-35 min: Draft intake fields, metric contracts, and automated checks.
4. 35-50 min: Design readout order and archive schema.
5. 50-60 min: Share one low-risk fast lane and one high-risk review case.

## 90-Minute Agenda

1. 0-10 min: Review the worked review-board example.
2. 10-25 min: Teams define charter and risk tiers.
3. 25-40 min: Intake fields, metric contracts, and launch readiness.
4. 40-55 min: Automated checks versus human review.
5. 55-75 min: Readout, archive, and decision memory.
6. 75-90 min: Critique for process drag and rubber-stamp failure.

## Team Exercise

Each team produces an experiment review-board operating model with:

- charter
- risk tiers
- intake fields
- metric contracts
- automated checks
- human review criteria
- launch readiness
- readout order
- archive schema

## Discussion Prompts

- Which experiments should bypass human review?
- Which experiments require cross-functional review?
- What should block launch automatically?
- What should be preserved so future teams learn from the result?

## Facilitator Notes

The goal is proportional governance. Low-risk learning should stay fast; high-risk experiments should be slower, more explicit, and easier to audit.

Common failure modes:

- same review burden for every experiment
- p-values before validity checks
- no metric contracts
- archive stores winners but loses reusable learning

## Review Standard

Use `final-assessment.md` as the rubric. A strong operating model improves experiment quality without burying low-risk learning in needless process.