Vice PresidentPIMCO, NYC
Bayesian Nonparametric Modeling Using Levy Process Priors with Applications for Function Estimation, Time Series Modeling and Spatio-Temporal Modeling
In this dissertation, we propose a new class of Bayesian method for nonparametric function estimation. We denote the new model as Levy adaptive regression kernel or ``LARK''. The LARK model is based on a stochastic expansion of functions in an overcomplete dictionary, which can be formulated as a stochastic integration problem with a random measure. The unknown function is represented as a weighted sum of kernel or generator functions with arbitrary location parameters. Scaling parameters of the kernels are also taken as location specific and thus are adaptive, as with wavelets bases and dictionaries. Levy random fields are introduced to construct prior distributions on the unknown functions, which lead to the specification of a joint prior distribution for the number of kernels, kernel regression coefficients and kernel associated parameters. Under Gaussian errors, the problem may be formulated as a sparse regression problem, with regularization induced through the Levy random field prior. To make posterior inference on the unknown functions, a reversible jump MCMC algorithm is developed. The LARK framework developed in this dissertation can be used to model both Gaussian and nonstationary Non-Gaussian data. The adaptability of the kernels is especially useful for modeling spatially inhomogeneous functions. Unlike many wavelet based methods, there is no requirement that the data are equally spaced. The RJ-MCMC algorithm developed for fitting the LARK model provides an automatic search mechanism for finding sparse representations of a function. Fitting a LARK model does not involve matrix calculation, thus the model is amenable to large data set. We start with reviews on some basic properties and theories of Levy processes in Chapter 1, which serve as the theoretical foundations for this dissertation. Chapter 2 develops LARK model in the context of nonparametric regression problems. Both simulated and real examples are used to illustrate the method. Chapter 3 applies LARK model for multivariate air pollutant time series modeling. Based on LARK framework, we develop a new class of spatio-temporal models in Chapter 4. A simulated data set and SO2 monitoring data from the Environmental Protection Agency are used to demonstrate the model. We conclude the dissertation in Chapter 5 by summarizing the LARK framework and pointing out directions for future research.