Haige Shen

missing portrait
External address: 
Shanghai, China
Graduation Year: 

Employment Info

Vice President & Head of Statistics
Panacro Medical Technology Co., Ltd.


Bayesian Analysis in Cancer Pathway Studies and Probabilistic Pathway Annotation

Improving the understanding of the complexity of molecular pathways underlying cancer phenotypes is essential to uncovering the dynamic processes of cancer development. As part of this, linking quantified, experimentally defined gene expression signatures with known biological pathway gene sets is a key challenge. This dissertation presents a novel Bayesian statistical approach to this pathway annotation problem. In my approach, a formal probabilistic model delivers probabilities over pathways for an experimental signature, thus allowing a quantitative assessment and ranking of pathways putatively linked to the experimental phenotype. The fundamental advantage of this approach is formal modeling of the uncertainty in the pathway analysis. Biological understanding of the data and knowledge are incorporated in the model. In addition, coherent inference on uncertainties about gene pathway membership highlights a key benefit of this model-based approach. Technically, this research involves advanced statistical modeling and high-dimensional computation. Analysis of the models uses Markov chain Monte Carlo techniques and variational methods for statistical computation. To evaluate model evidence, a critical component of pathway analysis, I propose an innovative Monte Carlo variational method that provides optimal upper and lower bounds on model evidence. This method, motivated and developed by genomic pathway analysis, is in fact general and represents an advance in statistical model-based computation of much broader utility. The effectiveness and robustness of my approach are tested through simulation studies as well as analyses of real data sets, including “proof-of-principle” pathway annotation for breast tumor estrogen-receptor and ErbB2 phenotypes. A study of pathway activities underlying the cellular response to lactic acidosis micro-environment in breast tumors involves the analyses of both in vitro and in vivo data, and demonstrates the application of the method in decomposing the complexity of gene expression based predictions about interacting pathway activation in this cancer context. In conclusion, this dissertation generates innovation in statistical methodology as well as in cancer genomics applications. Current and future research plans and directions include broad opportunities for application and evaluation in cancer genomics studies, as well as in other areas of genomics, and follow-on efficient computer implementations for use of the method by the research community.