Andrew Cron

Graduation Year: 
2012

Employment Info

Senior Research Scientist
84.51

Dissertation

Mixture Modeling, Sparse Covariance Estimation and Parallel Computing in Bayesian Analysis

Mixture modeling of continuous data is an extremely effective and popular method for density estimation and clustering. However as the size of the data grows, both in terms of dimension and number of observations, many modeling and computational problems arise. In the Bayesian setting, computational methods for posterior inference become intractable as the number of observations and/or possible clusters gets large. Furthermore, relabeling in sampling methods is increasingly difficult to address as the data gets large. This thesis addresses computational and methodological solutions to these problems by utilizing modern computational hardware and new methodology. Novel approaches for parsimonious covariance modeling and information sharing across multiple data sets are then built upon these computational improvements. Chapter 1 introduces the fundamental approaches in mixture modeling including Dirichlet processes and posterior inference using Gibbs sampling. Chapter 2 describes the utilization of graphical processing units for massive gains in computational performance in both mixture models and general Bayesian modeling. Chapter 3 introduces a new relabeling approach in mixture modeling that can be scaled far beyond current methodology to massive data and high dimensional settings. Chapter 4 generalizes chapters 2 and 3 to the hierarchical Dirichlet process setting to “borrow strength” from multiple studies in classification problems, with a motivating application using flow cytometry studies in immunology. Chapter 5 develops novel theory and methods for sparse covariance estimation using new classes of probability distributions over sparse, full rank, orthogonal matrices. In Chapter 6, these new methods are applied to mixture modeling with measurement error in classification problems. Finally, Chapter 7 summarizes the thesis and outlines important and exciting areas for future research