Research Roadmap · P11+

What we are building next

Each idea below extends the current P1–P10 suite to close a specific methodological gap in behavioural self-experimentation. All are motivated by real limitations discovered while running experiments on the DoOperator platform.

Research philosophy. Every project follows the same structure: (1) identify a gap where a standard algorithm makes an assumption that fails in behavioural data, (2) propose a principled fix using causal or evolutionary methods, (3) validate via simulation with known ground truth, (4) connect to the DoOperator platform as the real-world test bed. Papers developed here are effectively specs for a production causal RL system.

Active ideas

Seeded & in progress

These have a clear core idea, identified gap, and venue target. Work begins once a predecessor paper reaches near-ready status.

P11-A

Active Causal Discovery for N-of-1 Trials

Seeded

P8 shows trial count is the binding constraint and uses random intervention pulses. An active design that selects which cause to intervene on next — to resolve the most uncertain edge — would maximise causal discovery quality given a fixed budget. Bayesian optimal experiment design applied to graph recovery rather than scalar parameter estimation.

Builds on:P8

·Target: UAI / AISTATS

P11-B

Compliance-Aware N-of-1 Design

Seeded

N-of-1 designs assume perfect compliance. In practice, compliance is intermittent, creating an IV setting: assignment is random but actual exposure varies. ITT vs LATE estimation under partial compliance, and how to adapt allocation dynamically for non-compliance, is a clean methodological gap directly motivated by platform data.

Builds on:P2 P5 P10

·Target: JRSS-B / AISTATS

P11-C

Sequential Changepoint Detection for Experiment Redesign

Seeded

When should an experiment be stopped, adapted, or restarted? Current stopping rules use statistical significance. A causal changepoint framework detects structural changes in the DGP — tolerance, life change, seasonal shift — and triggers experiment redesign rather than just termination.

Builds on:P5 P2 P10

·Target: NeurIPS / UAI

P11-D

Federated Causal Graph Transfer

Seeded

P8 pools evidence about edge scores. A stronger form transfers entire causal graphs: users in the same response cluster (P6) share their learned graph as a prior for new users. A graph-transfer kernel measures how much two users' causal structures should be correlated given observable similarity. Connects to causal transportability (Pearl & Bareinboim, 2011).

Builds on:P6 P8

·Target: NeurIPS / JMLR

P11-E

Confounded N-of-1 Causal Discovery

Seeded

P8 assumes intervention traces are clean. But users choose when to experiment based on their state, confounding the discovered graph. P7's sensitivity framework applied to the causal discovery problem: sensitivity bounds on edge scores under worst-case hidden-confounder bias in the assignment mechanism.

Builds on:P7 P8

·Target: UAI / AISTATS

P11-G

Confounded Atari Benchmark for Neural Backdoor Adjustment

Seeded

P3's NeuralCausalQ experiment demonstrates the principle on a 6-state chain MDP. The next step: take 5 Atari games and inject an observable confounder (a background screen feature that correlates with the behavioural policy and reward). Train NeuralQL and NeuralCausalQ on confounded trajectories; evaluate greedy performance. Produces a community-reusable confounded-Atari dataset.

Builds on:P3

·Target: NeurIPS (benchmarks track) / ICML

P11-H

Neuroevolution of Causal Policies (NE-CausalQ)

Seeded

NeuralCausalQ (P3 Exp 5) uses gradient descent for auxiliary heads. But the loss signal is still observational and susceptible to confounded gradients. Replace with CMA-ES evolution over reward model and confounder classifier weights: fitness is policy accuracy on a held-out unconfounded eval set, not the observed training loss. CMA-ES sidesteps confounded gradient direction by evaluating policy outcomes under the do-operator.

Builds on:P2 P3

·Target: GECCO / NeurIPS workshop

P11-I

MAP-Elites Design Archive for Personalised Experiment Templates

Seeded

PCA-ES (P4) warm-starts CMA-ES from a pooled covariance that collapses across DGP types. MAP-Elites would maintain a 2D design archive: one axis is the effect-age DGP family (novelty/habituation/delayed-onset/fatigue), the other is autocorrelation level. Each cell stores the highest-power CMA-ES design found for that (DGP, ρ) combination. New user → DGP classifier → retrieve archive cell → warm-start or serve directly.

Builds on:P2 P4 P5

·Target: GECCO / JMLR

P11-J

CMA-ES for Behavioural Reward Function Design

Seeded

P9 takes a reward shaping signal as given and asks whether it is causally admissible. This paper asks the upstream question: design the reward function itself using evolutionary search. The platform specifies a target behavioural outcome; CMA-ES searches over auxiliary reward parameterisations with fitness being durable habit formation at day 60, not immediate reward sum. P9's admissibility conditions become hard constraints in the feasibility check.

Builds on:P2 P9

·Target: NeurIPS workshop / RLDM

Backlog

On the horizon

Identified gaps that need predecessor papers to mature before they can be properly scoped.

P11-F

Adaptive Washout Period Estimation

Backlog

The required washout period is DGP-specific but is currently fixed by protocol. A paper on learning washout from early experiment data — estimating carryover decay and adapting washout dynamically — is directly deployable. P10 shows washout is a validity-gate-critical parameter; this paper closes the loop by making it data-adaptive.

Builds on:P2 P5 P10

·Target: JRSS-B / Biostatistics

P11-K

Non-Stationary Causal Graph Tracking via Separable NES

Backlog

P8 discovers causal graphs under stationarity. But behavioural causal structures change over time: a sleep→cognition edge strengthens as sleep hygiene improves; a caffeine→energy edge weakens as tolerance builds. Use Separable NES to track a sliding-window estimate of causal edge weights: the fitness function is predictive likelihood of the last W trials; the search distribution adapts its covariance to track graph drift.

Builds on:P5 P8

·Target: UAI / AISTATS

Foundation

The P1–P10 papers these build on

Every roadmap item has a "Builds on" chain. Read the predecessor papers to understand the methodology, results, and open questions each P11+ idea addresses.

Browse courses →Read the textbook