Research Assistant, Ph D Student
Quantitative Analyst, Baseball Research & DevelopmentCleveland IndiansMay 2019-Present
Advances in Survey Methodology and Sports Science
ABSTRACT This thesis develops statistical methodology for efficient uncertainty quantification in the presence of small sample sizes and/or missing data. These methods have a wide range of potential applications, though they are particularly relevant for the analysis of cross-sectional survey data. In the analysis of survey data it is frequently of interest to estimate and quantify uncertainty about means or totals for each of several non-overlapping subpopulations, or areas. Sometimes there are areas with small sample sizes under the survey design, which can result in wide confidence intervals. While some model-based methods have been developed to reduce interval width by utilizing data from other areas, these interval procedures do not have the nominal frequentist coverage rate for all values of the target quantity. We develop an alternative model-based confidence interval procedure that leverages data from other areas to reduce expected interval width. Importantly, our procedure maintains the nominal frequentist coverage rate for all values of the target quantity and is coverage-robust to model misspecification. Missing data values are also pervasive in survey samples. Imputing multiple completed datasets is a principled way to avoid removing observations with incomplete values while simultaneously accounting for the uncertainty involved in the imputation procedure. The quality of imputations can be improved when the support of the data is known a priori. We develop methodology for multiple imputation of mixed data when the support is known a priori to be a subset of the data product space. This can improve the quality of the resulting imputed data sets, resulting in more efficient statistical inference. In addition to its contributions in the field of survey methodology, this thesis also contributes to the sports science literature by developing Bayesian latent variable models for the analysis of visual-motor expertise. In particular, we consider a multivariate dataset consisting of visual-motor assessments for 2317 athletes, including 252 professional baseball players. We quantify the variation in visual-motor expertise in athletes by level of expertise, gender, and sport type. Moreover, we examine the dependence among the battery of assessments and their relationship to on-field performance in professional baseball. We find significant positive associations between performance on the assessment battery and measures of baseball performance, particularly those that involve plate discipline.