Co-Director, Undergraduate Data Analytics MajorThe Ohio State UniversityFeb 2014-Present
Associate Professor of StatisticsThe Ohio State UniversityOct 2015-Present
Regression Model Search and Uncertainty With Many Predictors
In problems of variable selection and model uncertainty, as well as in multivariate structure assessment, our ability to coherently model and analyze data when faced with increasing variable dimension is challenged by questions of model structuring, theoretical specification and computation. This dissertation addresses each of these issues, primarily in the contexts of regression and prediction, and demonstrates how coherent Bayesian models can be developed and applied in problems in high dimension. Chapter 1 sets the context for the dissertation, describing the theoretical and computational issues that arise as a result of increased variable dimension. The idea of "sparsity" in high dimensional multivariate models is introduced. Chapter 2 introduces a novel stochastic search algorithm for exploring large regression model spaces. Contrasts are made with existing Markov chain Monte Carlo methods. A simulation study is used to validate the method, and analytic evaluation of the method's properties is described. Chapter 3 gives an overview of regression model selection and averaging from a Bayesian perspective using the search methods described in Chapter 2. Particular prior distributions and their advantages for use in linear regression modeling with many variables are described, with emphasis on coherency and aspects of sparsity. Chapter 4 illustrates high dimensional linear regression model search using gene expression data from a survival study in brain cancer. Chapter 5 introduces useful results regarding the marginal likelihood under a particular probability model. A lower bound on the marginal likelihood for models of a common dimension is established and related to sparsity and Bayesian regularization. Reasonable assumptions about the distribution of the predictor variables allow for Bayesian learning about the sparsity inducing prior parameter. Chapter 6 contains two examples of regression modeling and prediction in high dimension outside of the context of the linear model from clinical genomics studies in breast and lung cancer. Finally, Chapter 7 concludes the dissertation by summarizing coherent Bayesian regression modeling in high dimensions. Generalizations of the stochastic search method are described, and future work in complex high dimensional multivariate modeling is set forth.