Carlos M Carvalho
Professor of StatisticsUniversity of Texas at Austin, McCombs School of BusinessJul 2010-Present
Structure and Sparsity in High-Dimensional Multivariate Analysis
As scientific problems grow in terms of both expanding parameter dimension and sample sizes, structure and sparsity become central concepts in practical data analysis and inference. By allowing complex high-dimensional problems to be modeled through low-dimensional underlying relationships, sparsity helps to simplify estimation, reduce computational burden and facilitate interpretation of large scale datasets. This dissertation addresses the issue of sparsity modeling primarily in the context of Gaussian graphical models and sparse factor models. Chapter 1 contextualizes the dissertation by introducing the way sparsity models are discussed throughout this work. Chapter 2 introduces the basic theory of Gaussian graphical models and central elements of Bayesian analysis in this class of models. Chapter 3 is concern with problem of model determination in graphical model space. Existing methods are tested in high-dimensional setups and a novel parallel stochastic search method is described. Both decomposable and non-decomposable graphs are considered. Examples of moderate (12-20) to large (150) size are considered, combining simple synthetic examples with data analysis from gene expression studies. Chapter 4 develops a efficient method for direct simulation from the hyper-inverse Wishart prior/posterior on any defined graphical model. This new sampling method provides completion of the simulation toolbox for Bayesian exploration and analysis of Gaussian graphical models under HIW priors. Chapter 5 extends conditional independence ideas from Gaussian graphical models to multivariate dynamic linear models. After presenting the development of this new class of models the chapter focuses on applications of such models in large financial time series portfolio allocation problems. Chapter 6 deals with sparse factor models where model search and fitting are addressed through stochastic simulation (MCMC) and a novel computational strategy involving a evolutionary search to address the issue of identifying variables for inclusion. This forms a first, Bayesian ``projection pursuit'' method relevant in high-dimensional factor and structure analysis. Examples are drawn from genomic studies where factor models aim to identify multi-dimensional biological patterns related to oncogenic pathways. Finally, Chapter 7 summarizes the dissertation and discusses possible generalizations and future work.