Jake Coleman

Jake Coleman
External address: 
Los Angeles, California
Graduation Year: 
2019

Employment Info

Senior Quantitative Analyst
Los Angeles Dodgers
August 2019-Present

Dissertation

Topics in Bayesian Computer Model Emulation & Calibration with Applications to High Energy Particle Collisions

Problems involving computer model emulation arise when scientists simulate expensive experiments with computationally expensive computer models. To more quickly probe the experimental design space, statisticians build emulators that act as fast surrogates to the computationally expensive computer models. The emulators are typically Gaussian processes, in order to induce spatial correlation in the input space. Often the main scientific interest lies in inference on one or more input parameters of the computer model which do not vary in nature. Inference on these input parameters is referred to as ``calibration,'' and these inputs are referred to as ``calibration parameters.'' We first detail our emulation and calibration model for an application in high-energy particle physics; this model brings together some existing ideas in the literature on handling multivariate output, and lays out a foundation for the remainder of the thesis. In the next two chapters, we introduce novel ideas in the field of computer model emulation and calibration. The first addresses the problem of model comparison in this context, and how to simultaneously compare competing computer models while performing calibration. Using a mixture model to facilitate the comparison, we demonstrate that by conditioning on the mixture parameter we can recover the calibration parameter posterior from an independent calibration model. This mixture is then extended in the case of correlated data, a crucial innovation for this comparison framework to be useful in the particle collision setting. Lastly, we explore two possible non-exchangeable mixture models, where model preference changes over the input space. The second novel idea addresses density estimation when only coarse bin counts are available. We develop an estimation method which avoids costly numerical integration and maintains plausible correlation for nearby bins. Additionally, we extend the method to density regression so that full a full density can be predicted from an input parameter, having only been trained on coarse histograms. This enables inference on the input parameter, and we develop an importance sampling method that compares favorably to the foundational calibration method detailed earlier.