Heather D. Sasinowska
Vice President and COOINCOGEN, Inc
Prediction Using Orthogonalized Model Mixing
This dissertation investigates modeling strategies and numerical methods for prediction under model averaging in normal linear models and in Poisson regression with extensions to other generalized linear models. The focus is on mixing over possible subsets of candidate predictors. For linear regression models a sampling approach which uses an importance sampling technique is developed. This technique is based on an approximation of the posterior model probabilities using an orthogonalized transformation of the variables. The posterior probability is approximated by a product of independent Bernoulli random variables, each indicating whether or not an element of the orthogonal basis is included. This leads to an efficient importance sampling algorithm. In extending this to Poisson regression, one difficulty is that we cannot analytically integrate out model specific parameters to obtain posterior model probabilities - a key step in obtaining the probabilities for sampling and model mixing in normal linear regression. Under regularity conditions, applying a variance stabilizing transformation to the response results in an approximately normal distribution with a known constant variance. A Taylor series expansion of the mean function results in a linear model, so in the approximate problem, the previous linear model results can be used to approximate the posterior model probabilities for the Poisson problem. This allows for sampling directly from an approximation to the joint distribution over the model space. To evealuate orthogonalized model mixing for normal linear models, it is applied to a set of crime data. The model space is small enough to allow for enumeration of all models for comparison and convergence checks. Furthermore, we demonstrate the feasibility of orthogonalized model mixing in a large size problem (88 variables) which is very difficult to attack by other methods. The large data set originates from an experiment designed to predict protein activity under various storage conditions. To examine the approach for Poisson regression, orthogonalized model mixing is again applied first to a small data set for which enumeration of all models is available. Through comparison to a Gibbs sampler and a deterministic approach, we find that our method is fast in sampling models and that it supplies good approximations to the posterior model probabilities and predictive distributions. Our method for Poisson regression is then applied to a data set of 126 variables. This large data set was designed to examine the effect of particulate pollution on daily death counts and is difficult to analyze in terms of the original variables.