Hierarchical Modeling for Rare Event Detection and Cell Subset Alignment across Flow Cytometry Samples

Andrew Cron, Jacob Frelinger, Lynn Lin, Cecile Gouttefangeas, Satwinder K. Singh, Cedrik M. Britten, Marij J.P. Welters, Sjoerd H. van der Burg, Mike West, Cliburn CHan
Duke University, Duke University, Fred Hutchinson Cancer Center, Eberhard Karls University, Leiden University Medical Center, Johannes Gutenberg-University Medical Center Mainz, Leiden University Medical Center, Leiden University Medical Center, Duke...

Nov 25 2012

Flow cytometry is the prototypical assay for multi-parameter single cell analysis, and is essential in vaccine and biomarker research for the enumeration of antigen-specific lymphocytes that are often found in extremely low frequencies (0.1\% or less). Standard analysis of flow cytometry data relies on visual identification of cell subsets by experts, a process that is subjective and often difficult to reproduce. An alternative and more objective approach is the use of statistical models to identify cell subsets of interest. Critical features currently deficient in such models are the ability to detect extremely low frequency event subsets without biasing the estimate by pre-processing enrichment, and the ability to align cell subsets across multiple data samples for comparative analysis. In this manuscript, we develop hierarchical modeling extensions to the Dirichlet Process Gaussian Mixture Model (DPGMM) approach we have previously described for cell subset identification, and show that the hierarchical DPGMM (HDPGMM) naturally generates an aligned data model across multiple samples that respects both sample-specific and batch-specific characteristics. HDPGMM also increases the sensitivity to extremely low frequency events by ``borrowing strength'' across multiple samples analyzed simultaneously. We validate the accuracy and reproducibility of HDPGMM estimates of antigen-specific T cells on clinically relevant reference peripheral blood mononuclear cell (PBMC) samples with known frequencies of antigen-specific T cells. These cell samples take advantage of retrovirally TCR-transduced T cells spiked into autologous PBMC samples to give a defined number of antigen-specific T cells detectable by HLA-peptide multimer binding. We also provide highly-optimized open source software that can take advantage of both multiple processors (using MPI) and massively parallel GPU cores (using CUDA) to accelerate the numerical computations. By addressing the rare subset and cell subset alignment problems, HDPGMM greatly increases the usefulness of automated flow cytometry data analysis and its relevance for immune monitoring and the discovery of immune-based biomarkers.

Research partially supported by grants from the Wallace Coulter Foundation, the U.S. National Science Foundation (DMS 1106516), and the National Institutes of Health (P50-GM081883, RC1 AI086032). Any opinions, findings and conclusions or recommendations expressed in this work are those of the authors and do not necessarily reflect the views of the Wallace Coulter Foundation, the NIH or the NSF

The software used in this work is available. The Python open source code for fitting DPGMM and HDPGMM models will run on regular CPUs, but is optimized for massively parallel computing using the CUDA interface (a suitable Nvidia GPU is required for CUDA).


PDF icon 2012-17.pdf

BibTeX Citation: 

  author = {A. J. Cron and C. Gouttefangeas and J. Frelinger and L. Lin and S.
	        K. Singh and C. M. Britten and M. J. P. Welters and S. H. van de Burg 
               and M. West and C. Chan},
  title = {Hierarchical modeling for rare event detection and cell subset alignment across flow cytometry samples},
  journal = {PLoS Computational Biology},
  year = {2013},
  volume = {9},
  pages = {e1003130},
  doi = {10.1371/journal.pcbi.1003130},
  url = {http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003130}