Hanyu Song


Master's Thesis

Wavelet Regression Using MapReduce & Analysis of Multiple Schlerosis Clinical Data

Abstract: Two problems, one related to scalable methods and the other on application of statistical methods to clinical data are addressed in this thesis. In the first chapter, motivated by growing numbers of ``large p'' datasets, we present a novel MapReduce framework for handling multivariate wavelet regression. We compare the time complexity of proposed and conventional methods and show the novel framework scales linearly in the dimension $p$ of the response matrix. Empirical results show consistency with our complexity analysis. This work has its potential application in analysing image data or genomic data where the dimensions are huge. In the second chapter, we explore a clinical dataset of Multiple Sclerosis (MS) provided by Biogen, which comprises 579 actively managed MS patients enrolled at single center for up to 5 years. Since a therapy to curing MS is unknown, Biogen and we are developing statistical models to predict the progression of disability level as a therapeutic guide. Such disability can be roughly quantified by EDSS (Expanded Disability Status Scale), and as such we conduct predict modelling of EDSS. Before we arrive at these models, we perform explanatory data analysis, conduct predictive modelling of current EDSS based on measurements in the same year.