WASP: Scalable Bayes via barycenters of subset posteriors

Authors: 
Sanvesh Srivastava, Volkan Cevher, Quoc Tran-Dinh, David B. Dunson
Department of Statistical Science, Duke University, LIONS, École Polytechnique Fédérale de Lausanne, LIONS, École Polytechnique Fédérale de Lausanne, Department of Statistical Science, Duke University

Oct 27 2014

The promise of Bayesian methods for big data sets has not fully been realized due to the lack of scalable computational algorithms. For massive data, it is necessary to store and process subsets on different machines in a distributed manner. We propose a simple, general, and highly efficient approach, which first runs a posterior sampling algorithm in parallel on different machines for subsets of a large data set. To combine these subset posteriors, we calculate the Wasserstein barycenter via a highly efficient linear program. The resulting estimate for the Wasserstein posterior (WASP) has an atomic form, facilitating straightforward estimation of posterior summaries of functionals of interest. The WASP approach allows posterior sampling algorithms for smaller data sets to be trivially scaled to huge data. We provide theoretical justification in terms of posterior consistency and algorithm efficiency. Examples are provided in complex settings including Gaussian process regression and nonparametric Bayes mixture models.

Keywords: 

Distributed, Parallel, and Cluster Computing; ; Stochastic Approximation; Linear Programming; Scalable Bayes; Wasserstein Distance; Wasserstein Barycenter

BibTeX Citation: 

@Article{Srietal14,
    title = {WASP: Scalable Bayes via barycenters of subset posteriors},
    author = {Sanvesh Srivastava and Volkan Cevher and Quoc Tran-Dinh
      and David B. Dunson},
    journal = {Duke Discussion Paper-2014-05},
    year = {2014},
    url = {https://stat.duke.edu/research/papers/2014-05}
  }