# Exercise: Feature Adoption and Retention

Use `feature_adoption_retention.csv`.

## Goal

Practice distinguishing a naive association from a causal identification plan.

## Questions

1. What is the naive retention difference between workspaces that used collaboration and those that did not?
2. Which pre-treatment variables look imbalanced across adoption groups?
3. Why might baseline sessions and invites sent confound the relationship?
4. Which variables should not be adjusted for if measured after feature use?
5. What would make this estimate too fragile for a causal claim?

## SQL Starter

Naive association:

```sql
SELECT
  used_collaboration_week1,
  COUNT(*) AS workspaces,
  AVG(retained_30d) AS retention_rate
FROM feature_adoption_retention
GROUP BY used_collaboration_week1;
```

Covariate balance:

```sql
SELECT
  used_collaboration_week1,
  AVG(team_size) AS avg_team_size,
  AVG(baseline_sessions) AS avg_baseline_sessions,
  AVG(invites_sent) AS avg_invites_sent
FROM feature_adoption_retention
GROUP BY used_collaboration_week1;
```

## Interpretation Prompt

Write two paragraphs:

1. The naive association.
2. Why the association is not yet a credible causal estimate.

Then name the identification strategy you would use next.

## Worked-Solution Standard

A strong answer should not say "collaboration increases retention" without assumptions. It should name likely confounders, discuss overlap, and recommend either adjustment with sensitivity analysis or a randomized prompt experiment.
