Problems in high dimensional data analysis often involve simultaneous tests of many hypotheses. Therefore, to prevent inflated false positive rates, p-values from individual tests of hypotheses are often adjusted for multiple comparisons, e.g. by controlling Family Wise Error Rates (FWER) or False Discovery Rates (FDR). However, available methods mostly focus on control of false positives, and controlling false negatives is often overlooked. In addition, complex correlation structures among hypotheses often complicate the analysis of multiple comparison adjustment procedures.
In this talk, I present a new methodology for simultaneous selection of the subset of alternative (or active) hypotheses among the large set of hypotheses. The proposed methodology is based on a perturbation of usual p-values that results in a dichotomous behavior, and offers desirable asymptotic and small sample properties. In particular, we show that the set of active hypotheses is consistently estimated even as the total number of hypotheses increases exponentially with the sample size, in arbitrary correlation structures and vanishing signal-to-noise regimes. Numerical experiments verify these findings in small sample settings, and indicate that the proposed methodology outperforms FDR and FWER controlling procedures in a number of simulated examples.
Body