MAW Barycenter for Gaussian Mixture Models with Application to Multi-source Single-cell Data Integration

Speaker(s): Lynn Lin, Assistant Professor of Biostatistics & Bioinformatics, Duke University

The Minimized Aggregated Wasserstein (MAW) distance for Gaussian mixture models (GMM) has been used as a computationally efficient approximation to the Wasserstein metric. Recently, significant theoretical advances on MAW have been made, providing deep insight into its optimality. In this talk, we develop a new algorithm for computing the barycenter of GMMs under MAW distance and prove that this barycenter has the same expectation as the Wasserstein barycenter. In addition, we illustrate its practical use with examples of integrating single-cell data across different sources. We demonstrate that the new method achieves better clustering results on several single-cell RNA-seq data sets than some other popular methods.


Statistical Science

Photo of Lynn Lin


Statistical Science Seminar Coordinator