Xi Kathy Zhou
Associate Professor, Division of Biostatistics and Epidemiology, Department of Healthcare Policy and ResearchWeill Medical College of Cornell University
Classification of Missense Mutations of Disease Genes
Missense mutations of disease genes pose a challenging classification problem because of the uncertainty associated with their implications to the risk of disease. Assessing the risk implications is often complicated by small sample size and lack of an appropriate functional assay. For large genes such as BRCA1 and BRCA2, it is common to infer risk implications from pedigree data. It is typical to have a relatively small sample size for each mutation, and to only have pedigrees of individuals who are selected because of a high disease rate in the family. This selection mechanism is likely to overstate the mutation's contribution to risk of disease. In this study, we develop a Bayesian hierarchical methodology which classifies missense mutations as deleterious or non-deleterious based on mutation specific penetrances estimated from pedigree data. We consider multiple competing genes and multiple phenotypes (e.g. cancer sites). The basis of our approach is to model the age--dependent mutation--specific penetrance functions by a hazard mixture of the phenocopy rate and penetrance of deleterious mutations. This permits us to take the age effect into account while accommodating limited sample size. We assume penetrances of known deleterious mutations and phenocopy rates are estimated in previous studies. Using this mixture model for penetrance as the basis, we develop a Bayesian hierarchical approach to classify missense mutations. The mixture parameter above is a composite of the deleteriousness of the mutation and the selection bias. We compare these parameters to similarly estimated penetrances of known deleterious mutations and common polymorphisms. This allows us to separate the deleteriousness component from the bias component, and thus to reduce the effect of the selection bias inherited from the data collection mechanism, since pedigrees identified through probands that are either negative (mostly with common polymorphisms) or positive (with known deleterious mutations or missense mutations) are collected based on the same sampling scheme. The model also takes into account the imperfect sensitivity of genotyping. Model parameters are estimated by using Markov Chain Monte Carlo methods. This approach is applied to the study of a sample of BRCA1 and BRCA2 missense mutations, using data collected at the Duke University Medical Center.