P. Richard Hahn
Assistant Professor of Econometrics and StatisticsChicago Booth Business School
Probability Models for Targeted Borrowing of Information
This dissertation is devoted to building Bayesian models for complex data, which are geared toward specific inferential aspects of applied problems. This broad topic is explored via three methodological case-studies, unified by the use of latent variables to build structured yet flexible models. Chapter one reviews previous work developing two classic Bayesian latent variable models: Gaussian factor models and latent mixture models. This background helps contextualize the contributions of later chapters. Chapter two (Hahn et al., 2011) considers the problem of analyzing patterns of covariation in dichotomous multivariate data. Sparse factor models are adapted for this purpose using a probit link function, extending the work of Carvalho et al.(2008) to the multivariate binary case. Simulation studies show that the regularization properties of the sparsity priors aid inference even when the data is generated according to a non-sparse, non-factor model. The model is then applied to congressional roll call voting data to conduct an exploratory study of voting behavior in the U.S. Senate. Unsurprisingly, the data is readily characterized in terms of only a few latent factors, the most dominant of which is recognized as party affiliation. Chapter three (Hahn et al., 2010a) turns to the use of factor models for the purpose of regularized linear prediction. First it is demonstrated that likelihood-based factor model selection for the purpose of prediction is difficult and the root causes of this difficulty are described. Then, it is explained how to avoid this difficulty by modeling the marginal predictor covariance with a factor model while letting the response variable deviate from the factor structure if necessary. This novel parameterization yields improved out-of-sample prediction compared to competing methods, including ridge regression and unmodified factor regression, on both real and synthetic data. Chapter four (Hahn et al., 2010b) concerns mixtures of Beta distributions for modeling observations on a finite interval. Mixture models have long been used for the purpose of density estimation, with the added benefit that the inferred latent mixture components often have plausible subject-specific interpretations (Escobar and West, 1995a). This chapter develops a statistical approach - within the specific context of a behavioral game theory experiment (Nagel, 1995) - which permits refined statistical assessment of these subject-specific interpretations. The new model is fit to specifically collected data, allowing refined model-testing using a posterior holdout log-likelihood score (similar to a Bayes factor). In addition to providing improved testing capability, this chapter serves as an introduction to the world of behavioral game theory for statisticians and as an explicitly statistical perspective on a well known example for behavioral economists. Chapter five concludes with a summary of two works-in-progress based on latent Gaussian processes: a model for nonlinear conditional quantile regression and a model for Lie group-based Bayesian manifold learning.