Leanna L House
Associate ProfessorVirginia Tech, Department of Statistics
Nonparametric Bayesian Models in Expression Proteomic Applications
Bayesian nonparametric analyses develop probability models on very high, possibly infinite, dimensional function spaces. However, with the benefits of exploring large parameter spaces comes the responsibility of controlling potentially over-parameterized models. With thoughtful prior elicitation, Bayesian methods may naturally impose model complexity restrictions depending upon whether a function is defined by a collection of random components or as one random variable. This dissertation, via the progression of three separate works, takes advantage of two ways prior distributions may penalize complex functions in nonparametric analyses of expression proteomic data. Since all cellular functions are carried out by proteins, the primary purpose for expression proteomics is to assess from differences in protein production how an organism responds under various conditions. One common way to assess the differences is to analyze the protein content of varying biological samples using Matrix Assisted Laser Desorption/Ionization Time-of-Flight (MALDI-TOF) mass spectrometry (MS). Although, MALDI-TOF MS has many analytical benefits, inherent within the technology are sources of measurement error that make deciphering true signal from noise difficult. Thus, all expression proteomic studies that use MALDI-TOF MS data must first extract data of interest from mass spectra before making inference. Chapters 2 and 3 are devoted solely to developing nonparametric Bayesian models to identify significant features from individual spectra. Both models estimate an unknown function Y_s that represents true protein signal as a weighted sum of J kernel functions with either prespecified or data-determined location parameters. In the prespecified case, a truncated exponential prior on the coefficients regularize the proposed over-parameterized model. In the unspecified case, the function Y_s itself is assumed to be a random variable for which a Lévy random field prior is elicited. The process prior of Y_s penalizes complex models and is comparable to specifying a joint prior distribution on J kernel function parameters, and the basis coefficients. Chapter 4 expands the model presented in Chapter 3 to include multiple spectra from two sub-populations. Underlying every observed spectra, regardless of sub-population, is one mean-spectrum that is modeled similarly to Y_s as described in Chapter 3 with a Lévy random field prior. The difference is an added dimension to the random field that represents sub-population association. With the new dimension, the proposed model extracts and aligns population significant features and compares them to make treatment group classifications.