Senior Data Scientist, YouTube Data Science TeamGoogle Inc - YouTubeNov 2017-Present
Bayesian Methods for Two-Sample Comparison
Two-sample comparison is a fundamental problem in statistics. Given two samples of data, the interest lies in understanding whether the two samples were generated by the same distribution or not. Traditional two-sample comparison methods are not suitable for modern data where the underlying distributions are multivariate and highly multi-modal, and the differences across the distributions are often locally concentrated. The focus of this thesis is to develop novel statistical methodology for two-sample comparison which is effective in such scenarios. Tools from the nonparametric Bayesian literature are used to flexibly describe the distributions. Additionally, the two-sample comparison problem is decomposed into a collection of local tests on individual parameters describing the distributions. This strategy not only yields high statistical power, but also allows one to identify the nature of the distributional difference. In many real-world applications, detecting the nature of the difference is as important as the existence of the difference itself. Generalizations to multi-sample comparison and more complex statistical problems, such as multi-way analysis of variance, are also discussed.