Joseph E Lucas
Sr. Director, Analytics Business ConsultantLumerisJul 2018-Present
Associate Research Professor in the Social Science Research InstituteDuke University & Duke University Medical CenterAug 2016-Jun 2018
Sparsity Modeling for High Dimensional Systems: Applications in Genomics and Structural Biology
The availability of very high dimensional data has brought sparsity modeling to the forefront of statistical research in recent years. From complex physical models with hundreds of parameters to DNA microarrays which offer observations in tens to hundreds of thousands of dimensions, separating relevant and irrelevant parameters is becoming more and more important. This dissertation will focus on innovations in the area of variable and model selection as they pertain to these high dimensional systems. Chapter 1 will discuss work from the literature on the areas of variable and model selection. Chapter 2 will describe an innovation to hierarchical variable selection modeling that corrects errors that stem from assuming incorrectly that multiple thousands of observations are informing about the same distribution. In Chapter 3, we introduce a novel technique for applying variable selection priors to induce sparsity in variance modeling. One of the weaknesses of DNA microarrays is their sensitivity to the conditions under which they were prepared. Chapter 4 describes a technique for correcting the systematic bias that is introduced by these extreme sensitivities. Chapters 5 and 6 are both case studies. They focus on implementing the techniques described in chapters 2-4 in real world situations in order to ferret out pathway signatures and to apply those to clinical situations. Chapter 7 will introduce a new technique for sampling from a point mass mixture prior when calculation of the conditional probability is impossible. In Chapter 8, we apply this technique to a challenging problem in structural biology. For Chapter 9, we switch gears somewhat and apply some of the techniques of decision theory the protein folding problem introduced in chapter 8. We are able to use the results of our model fitting to inform future decisions for studying polypeptide helicity. Finally, we close, in Chapter 10, with some areas for future work that have opened up as a result of studying these variable selection techniques.