Transfer Learning and Data Alignment in Single Cell Transcriptomics

Nancy Zhang, Wharton School, University of Pennsylvania

Friday, October 12, 2018 - 3:30pm

Cells are the basic biological units of multicellular organisms.  The development of single-cell RNA sequencing (scRNA-seq) technologies have enabled us to study the diversity of cell types in tissue and to elucidate the roles of individual cell types in disease.  Yet, scRNA-seq data are noisy and sparse, with only a small proportion of the transcripts that are present in each cell represented in the final data matrix.  We propose a transfer learning framework to borrow information across related single cell data sets for de-noising and expression recovery.  Our goal is to leverage the expanding resources of publicly available scRNA-seq data, for example, the Human Cell Atlas which aims to be a comprehensive map of cell types in the human body.  Our method is based on a Bayesian hierarchical model coupled to a deep autoencoder, the latter trained to extract transferable gene expression features across studies coming from different labs, generated by different technologies, and/or obtained from different species.  Through this framework, we explore the limits of data sharing: How much can be learned across cell types, tissues, and species?  How useful are data from other technologies and labs in improving the estimates from your own study?  We also explore the implications of technical batch artifacts in the joint analysis of multiple data sets, and propose strategies for alignment of data across batch. 

Seminars generally take place in 116 Old Chemistry Building on Fridays from 3:30 - 4:30 pm. For additional information contact: or phone 919-684-8029. Sorry, but we do not have reprints available. Please feel free to contact the authors by email for follow-up information, articles, etc. Reception following seminar in 203B Old Chemistry.

Old Chemistry 116

Location Info