Senior Data ScientistAdobe Systems IncMay 2016-Present
Space and Space-Time Modeling of Directional Data
Directional data, i.e., data collected in the form of angles or natural directions arise in many scientific fields, such as oceanography, climatology, geology, meteorology and biology to name a few. The non-Euclidean nature of such data poses difficulties in applying ordinary statistical methods developed for inline data, motivating the need for specialized modeling framework for directional data. Motivated in particular by a marine application of modeling spatial association of wave directions and additionally association between spatial wave directions and spatial wave heights, this dissertation focuses on providing general frameworks of modeling spatial and spatio-temporal directional data, while also studying the theoretical properties of the proposed methods. In particular, the projected normal family of circular distributions is proposed as a default parametric family of distributions for directional data. Operating in a Bayesian framework and exploiting standard data augmentation techniques, the projected normal family is shown to have straightforward extensions to the regression and process setting. A fully model-based approach is developed to capture structured spatial dependence for modeling directional data at different spatial locations. A stochastic process taking values on the circle, a projected Gaussian spatial process, is introduced. This spatial angular process is induced from an inline bivariate Gaussian process. The properties of the projected Gaussian process is discussed with special emphasis on the ``covariance'' structure. We show how to fit this process as a model for data, using suitable latent variables with Markov chain Monte Carlo methods. We also show how to implement spatial interpolation and conduct model comparison in this setting. Simulated examples are provided as proof of concept. A real data application arises for modeling the aforementioned wave direction data in the Adriatic sea, off the coast of Italy. This directional data being available dynamically, naturally motivated extension to a space-time setting. As the basis of the projected Gaussian process, the properties of the general projected normal distribution is first clarified. The general projected normal distribution on a circle is defined to be the distribution of a bivariate normal random variable with arbitrary mean and covariance, projected on the unit circle. The projected normal distribution is an under-utilized model for explaining directional data. In particular, the general version with non-identity covariance provides flexibility, e.g., bimodality, asymmetry, and convenient regression specification. For analyzing non-spatial circular data, fully Bayesian hierarchical models using the general projected normal distribution are developed and fitting using Markov chain Monte Carlo methods with suitable latent variables is illustrated. The posterior inference for distributional features such as the angular mean direction and concentration can be implemented as well as how prediction within the regression setting can be handled. For analyzing spatial directional data, latent variables are also introduced to facilitate the model fitting with MCMC methods. The implementation of spatial interpolation and conduction of model comparison are demonstrated. With regard to model comparison, an out-of-sample approach using both a predictive likelihood scoring loss criterion and a cumulative rank probability score criterion is utilized. This dissertation later focuses on building model extensions based on the framework of the projected Gaussian process. The wave directions data studied in the previous chapters also include wave height information at the same space and time resolution. Motivated by joint modeling of these important attributes of wave (wave directions and wave heights), a hierarchical framework is developed for jointly modeling spatial directional and ordinary linear observations. We show that the Bayesian model fitting under our model specification is straightforward using suitable latent variable augmentation via Markov chain Monte Carlo (MCMC). This joint model framework can easily incorporate space-time covariate information, enabling both spatial interpolation and temporal forecast. The spatial projected Gaussian process also provides a natural application in geosciences as aspect processes for the elevation maps. Compared to conventional calculations, a fully process model for aspects is provided, allowing full inference and arbitrary interpolation. The aspect processes can directly be inferred from a sample from the surface of elevations, providing the estimate and its uncertainties of the aspect at any new location over the region.