hahn at stat dot duke dot edu


Contact Info
022 Old Chemistry Building
Box 90251
Duke University
Durham, NC 27708-0251 USA

note: This semester I'm delighted to be supported through an NSF training grant with the Mathematical Biology group here at Duke. Check them out because they do really cool stuff!

Research (cv)

My dissertation is provisionally titled "Case Studies in Targeted Model Specification". A chapter summary is provided below, with links to the relevant working papers.
ABSTRACT
"This thesis presents modeling strategies which can be adopted by a full-blown Bayesian who is worried about model specification but who finds it difficult (as most of us do) to reason carefully about more than a handful of parameters. The first chapter reviews foundational results on model selection, presents motivating examples, and provides context by summarizing the relevant literature. The remainder of the manuscript turns to specific modeling scenarios.

Chapter two describes how to set up a model so that certain sub-models are insulated from misspecifications in other parts of the model, preventing "borrowing of misinformation". When a regression model is of primary interest, this problem becomes a matter of how best to incorporate the marginal distribution of the predictors into the overall model without accidentally fouling up the regression inferences. The key idea is just to buffer the two pieces with a prior distribution. This approach is studied in detail in the case of a linear factor model. (Predictor-dependent shrinkage for linear regression via partial factor modeling)

Chapter three tackles the problem of aligning multivariate time series (e.g., corporate account measures) and identifying common patterns of covariation among subsets of the observational units (e.g., companies). The usual vector autoregressive approach is rejected as too hard to interpret and unable to deal with highly noisy data. The proposed model reduces the main inference problem to just a few parameters by using a Gaussian process to capture the idiosyncratic behavior of the firms. The flexible mean structure permits the observed covariation to be intricate and distinct from company to company, while the shared patterns are describable in terms of a manageably small parameter vector. This division of labor increases both statistical power and model interpretability.

Chapter four considers a testing problem, which is to evaluate the evidence that a particular behavioral game theory model is in fact utilized by actual people. The innovation here is to collect additional data that should look a certain way if the model were true and use this fact to build a more informative likelihood function. This straightforward approach can be important in any model which uses latent variables that admit a natural interpretation. Because the variables are latent (unobservable, even in principle), their interpretation cannot be directly tested. However, if the desired interpretation implies that other data should look a certain way, we can put this additional data into the model to check that it looks like it is supposed to. This allows the interpretation to inform the statistical conclusions. (Testing Cognitive Hierarchy Theories of Beauty Contest Games)

Chapter five develops a model geared towards uncovering patterns of covariation in binary observation vectors. The proposed solution adapts sparse factor models for this purpose, employing a multivariate probit model with a sparse factor model "under the hood". However, both the factor component and the sparsity component must be modified to accommodate the fact that in the multivariate probit setting the factor model describes latent, rather than observed, quantities. This chapter outlines these modifications in detail and documents the gains in regularization and interpretability conferred by this approach. (Sparse Factor Models for Exploratory Analysis of Multivariate Binary Data)

Chapter six explores an alternative to the targeted inference approach, proposing a way to be a doctrinaire subjective Bayesian without specifying a sampling model at all. The general idea appears in the philosophy literature under the title "probability kinematics" and special cases have made their way into the statistics literature in different guises (e.g., marginal likelihood methods, Jeffreys substitution prior for quantiles). Here the method is described in measure theoretic detail and computational implementations are developed. The approach is illustrated on several small examples and the resulting posterior inferences are compared to both a fully Bayesian approach and to other likelihood-free approaches. The motivating example is multiple quantile regression."
Also see Symmetric Bayesian Multinomial Probit Models for work I'm involved in (though mostly this is a very nice idea of Lane's).

My committee members are Carlos Carvalho, Sayan Mukherjee, David Dunson, and Mike West.

Other collaborators include James Scott, Lane Burgette, Mauro Maggioni, Kristian Lum, Andrew Cron, and Carl Mela.


About Me
Before coming to Duke I spent two years studying operations research and statistics at New Mexico Institute of Mining and Technology in Socorro, NM under Brian Borchers and also Oleg Makhnin. As an undergraduate at Columbia University in New York, I took a BA in Economics-Philosophy, with a focus on rational choice theory and related topics.

Aside from my work, I like cycling and reading books on intellectual history, particularly biographies of scientists, philosophers and mathematicians. I also have a hobbyist's interest in green architecture and hope to some day build an off-the-grid cabin somewhere like this.

This is what I look like these days.



{Duke Department of Statistical Science}