P11-A
Active Causal Discovery for N-of-1 Trials
SeededP8 shows trial count is the binding constraint and uses random intervention pulses. An active design that selects which cause to intervene on next — to resolve the most uncertain edge — would maximise causal discovery quality given a fixed budget. Bayesian optimal experiment design applied to graph recovery rather than scalar parameter estimation.
P11-B
Compliance-Aware N-of-1 Design
SeededN-of-1 designs assume perfect compliance. In practice, compliance is intermittent, creating an IV setting: assignment is random but actual exposure varies. ITT vs LATE estimation under partial compliance, and how to adapt allocation dynamically for non-compliance, is a clean methodological gap directly motivated by platform data.
·Target: JRSS-B / AISTATS P11-C
Sequential Changepoint Detection for Experiment Redesign
SeededWhen should an experiment be stopped, adapted, or restarted? Current stopping rules use statistical significance. A causal changepoint framework detects structural changes in the DGP — tolerance, life change, seasonal shift — and triggers experiment redesign rather than just termination.
P11-D
Federated Causal Graph Transfer
SeededP8 pools evidence about edge scores. A stronger form transfers entire causal graphs: users in the same response cluster (P6) share their learned graph as a prior for new users. A graph-transfer kernel measures how much two users' causal structures should be correlated given observable similarity. Connects to causal transportability (Pearl & Bareinboim, 2011).
P11-E
Confounded N-of-1 Causal Discovery
SeededP8 assumes intervention traces are clean. But users choose when to experiment based on their state, confounding the discovered graph. P7's sensitivity framework applied to the causal discovery problem: sensitivity bounds on edge scores under worst-case hidden-confounder bias in the assignment mechanism.
P11-G
Confounded Atari Benchmark for Neural Backdoor Adjustment
SeededP3's NeuralCausalQ experiment demonstrates the principle on a 6-state chain MDP. The next step: take 5 Atari games and inject an observable confounder (a background screen feature that correlates with the behavioural policy and reward). Train NeuralQL and NeuralCausalQ on confounded trajectories; evaluate greedy performance. Produces a community-reusable confounded-Atari dataset.
·Target: NeurIPS (benchmarks track) / ICML P11-H
Neuroevolution of Causal Policies (NE-CausalQ)
SeededNeuralCausalQ (P3 Exp 5) uses gradient descent for auxiliary heads. But the loss signal is still observational and susceptible to confounded gradients. Replace with CMA-ES evolution over reward model and confounder classifier weights: fitness is policy accuracy on a held-out unconfounded eval set, not the observed training loss. CMA-ES sidesteps confounded gradient direction by evaluating policy outcomes under the do-operator.
·Target: GECCO / NeurIPS workshop P11-I
MAP-Elites Design Archive for Personalised Experiment Templates
SeededPCA-ES (P4) warm-starts CMA-ES from a pooled covariance that collapses across DGP types. MAP-Elites would maintain a 2D design archive: one axis is the effect-age DGP family (novelty/habituation/delayed-onset/fatigue), the other is autocorrelation level. Each cell stores the highest-power CMA-ES design found for that (DGP, ρ) combination. New user → DGP classifier → retrieve archive cell → warm-start or serve directly.
P11-J
CMA-ES for Behavioural Reward Function Design
SeededP9 takes a reward shaping signal as given and asks whether it is causally admissible. This paper asks the upstream question: design the reward function itself using evolutionary search. The platform specifies a target behavioural outcome; CMA-ES searches over auxiliary reward parameterisations with fitness being durable habit formation at day 60, not immediate reward sum. P9's admissibility conditions become hard constraints in the feasibility check.
·Target: NeurIPS workshop / RLDM