Adaptive testing of conditional association through Bayesian recursive mixture modeling

Li Ma
Duke University

Jan 22 2013

In many case-control settings, a central goal is to test for association or dependence between the predictors and the response. It is well-known that in such studies relevant covariates need to be conditioned on---ignoring them may lead to false positives as well as loss in power for detecting true associations. It is straightforward to condition on the covariates in a parametric framework such as the logistic regression---by incorporating the covariates into the model as additional variables. In contrast, classical nonparametric methods such as the Cochran-Mantel-Haenszel (CMH) test accomplish conditioning by dividing the data into strata, one for each of the possible covariate values. In many modern applications, this gives rise to a huge number of strata, most of which are sparse due to the multi-dimensionality of the covariate and/or predictor spaces. Such a brute-force way to conditioning is often extremely wasteful---in many studies we expect the covariate space to be consisting of a relatively small number of subsets with different response-predictor dependence. So the actual stratification that can sufficiently account for the conditioning often consists of just a small number of blocks. With this motivation, we introduce a Bayesian framework for nonparametric testing of predictor-response association that achieves adaptive conditioning on the covariates. Instead of forming a separate stratum for every possible covariate value, we infer from the data appropriate ways of stratification and test for association based on the inferred stratification. Inference under our framework proceeds entirely in a principled, probabilistic fashion that properly takes into account all sources of uncertainty, including that involved in inferring the stratification. The framework is constructed using a recursive mixture formulation on the retrospective distribution of the predictors, where the mixing distribution is a prior on the random partitions over the covariate space. The recursive mixture design allows inference under this framework to be carried out efficiently in closed form through a sequence of recursion, striking a balance between model flexibility and computational tractability. A power study shows that the additional adaptiveness so achieved does pay off---our method substantially outperforms classical tests based on brute-force conditioning.


PDF icon 2012-07.pdf