Body

Dependent-data settings, often modelled by networks, pose substantial challenges for experimentation and statistical inference. In this talk, I will explore both sides of this challenge: how to draw reliable inferences from experiments in the presence of network interference, and how to optimally design such experiments. 

We begin by considering the problem of post-experiment estimation in networks. When treatment spillovers occur—where a unit’s outcome is influenced not only by its own treatment but also by its neighbors’—naive estimators of the global average treatment effect (GATE) can be severely biased. A common remedy is to use Horvitz-Thompson estimators in conjunction with an exposure mapping that identifies units deemed unaffected by interference. A key challenge, however, lies in selecting a suitable interference threshold (e.g., the fraction of treated neighbors under which a unit is considered “unexposed”). We propose a data-adaptive method that learns this threshold by minimizing the mean squared error (MSE) of the estimator, leveraging a linear dose-response model to estimate bias and variance across thresholds. We demonstrate via simulations that our method improves upon fixed, non-adaptive choices. We briefly explore the theoretical underpinnings of inference with dependent data. A typical assumption in network interference settings is that node degrees are bounded. We relax this and show sufficient conditions for a central limit theorem for triangular arrays, covering network-interference, mixing random fields, and non-mixing autoregressive settings.

Finally, we address the design side of the problem: how should we assign treatments to minimize the worst-case MSE of the estimator under network interference and homophily? While cluster randomization is popular, it may exacerbate MSE in the presence of homophily, i.e. when units that are similar are more likely to be connected with each other. We reduce the problem to an optimization problem over the covariance matrices of the treatment assignment, trading off interference, homophily, and robustness. We frame and solve this problem through two approaches with complementary behaviors. The first is the formulation of a semidefinite program (SDP) together with the use of Gaussian rounding, similar to the Goemans-Williamson MAXCUT approximation algorithm. The second is an adaptation of a vector-balancing approach, the Gram-Schmidt Walk algorithm, which has recently received much attention. We evaluate the performance of our designs on synthetic networks and a real village network dataset.