Melanie Wilson Quintana
Statistical ScientistBerry Consultants
Bayesian Model Uncertainty and Prior Choice with Applications to Genetic Association Studies
The Bayesian approach to model selection allows for uncertainty in both model specific parameters and in the models themselves. Much of the recent Bayesian model uncertainty literature has focused on defining these prior distributions in an objective manner, providing conditions under which Bayes factors lead to the correct model selection, particularly in the situation where the number of variables, p, increases with the sample size, n. This is certainly the case in our area of motivation; the biological application of genetic association studies involving single nucleotide polymorphisms. While the most common approach to this problem has been to apply a marginal test to all genetic markers, we employ analytical strategies that improve upon these marginal methods by modeling the outcome variable as a function of a multivariate genetic profile using Bayesian variable selection. In doing so, we perform variable selection on a large number of correlated covariates within studies involving modest sample sizes. In particular, we present an efficient Bayesian model search strategy that searches over the space of genetic markers and their genetic parametrization. The resulting method for Multilevel Inference of SNP Associations MISA, allows computation of multilevel posterior probabilities and Bayes factors at the global, gene and SNP level. We use simulated data sets to characterize MISA's statistical power, and show that MISA has higher power to detect association than standard procedures. Using data from the North Carolina Ovarian Cancer Study (NCOCS), MISA identifies variants that were not identified by standard methods and have been externally 'validated' in independent studies. In the context of Bayesian model uncertainty for problems involving a large number of correlated covariates we characterize commonly used prior distributions on the model space and investigate their implicit multiplicity correction properties first in the extreme case where the model includes an increasing number of redundant covariates and then under the case of full rank design matrices. We provide conditions on the asymptotic (in n and p) behavior of the model space prior required to achieve consistent selection of the global hypothesis of at least one associated variable in the analysis using global posterior probabilities (i.e. under 0-1 loss). In particular, under the assumption that the null model is true, we show that the commonly used uniform prior on the model space leads to inconsistent selection of the global hypothesis via global posterior probabilities (the posterior probability of at least one association goes to 1) when the rank of the design matrix is finite. In the full rank case, we also show inconsistency when p goes to infinity faster than the square root of n. Alternatively, we show that any model space prior such that the global prior odds of association increases at a rate slower than the square root of n results in consistent selection of the global hypothesis in terms of posterior probabilities.