Gangqiang Xia

missing portrait
External address: 
1585 Broadway, New York, NY 10036
Phone: 
212-761-2030
Graduation Year: 
2006

Employment Info

Head, Mortgage Modeling Group
Morgan Stanley, NYC

Dissertation

On Large Sample Size Issues in Spatial Statistics

The focus of this thesis is on large sample size issues in spatial data analysis. Specifically, it consists of three major parts: spatial asymptotics; model fitting for large spatial datasets; and spatial design for one-time sampling. Our major contributions involve (i) providing various new results for spatial asymptotics; (ii) development of three new spatial process approximation methods useful for handling large spatial datasets; and (iii) development of approximately optimal sampling approaches for extensive spatial sampling. Performing large sample analysis for spatially dependent data is challenging. Based on different spatial sampling schemes, we consider three types of asymptotics: infill asymptotics, expansion asymptotics, and so called ``middle-ground" asymptotics. The first two asymptotics are well known but not fully studied. Middle-ground asymptotics is a new territory. We study the limiting behavior of the Fisher information matrix, the asymptotic properties of various estimators, and the weak identifiability of the parameters in spatial models under these three asymptotics scenarios. Historically, it has been difficult to apply spatial modeling techniques to analyze large spatial datasets. The problem is that we have to handle the inversion and determinant computation of a covariance matrix with the size same as the sample size. Consider fitting a Gaussian spatial model for a spatial dataset with a large sample size n. Likelihood based or Bayesian modeling suffer from severe computational difficulties since each evaluation of the exact likelihood requires an O(n3) operation. We refer to this computational challenge as the ``large n problem." We develop a new finite sum process approximation model which is both theoretically attractive and computationally efficient. The model is implemented in a Bayesian framework and applied to analyze several large spatial datasets. Finally, we consider the problem of approximately optimal design in the special case of one-time sampling at a large number of spatial locations. Our goal is to develop a good design strategy to help practitioners select sampling locations.