Towards a Unified Theory for Semiparametric Data Fusion Using Individual-Level Data (Joint work with Ellen Graham and Marco Carone)
Seminar presented by Andrea RotnitzkyThe work I will present in this talk contributes to the broader goal of developing a unified, automated methodology for debiased machine learning using individual-level data fused from multiple independent sources. The increasing availability of such data has spurred the development of new statistical theories for data integration, including a recent comprehensive framework by Li and Luedtke for cases where data sources align with different subsets of conditional distributions corresponding to a single factorization of the target distribution. However, many real-world data fusion problems violate this structure. Examples include integrating data from different epidemiological study designs, addressing measurement error with validation studies, or handling two-sample instrumental variable problems---cases where existing theory falls short.
In this talk, I will introduce a new framework that significantly extends the reach of the current theory by enabling the integration of individual-level data when sources align with conditional distributions that do not conform to a single factorization of the target distribution. I will present universal results characterizing the class of influence functions, and the efficient one, for regular asymptotically linear estimators and the efficient influence function for any pathwise differentiable parameter, regardless of the number of data sources, the parameter of interest, or the statistical model. This theory opens new avenues for machine-learning-assisted, semiparametric efficient estimation, pushing the boundaries of data integration in modern statistical science.
