Surya T. Tokdar


Associate Professor of Statistical Science

My core statistics research revolves around nonparametric Bayesian analysis of regression and density estimation problems. My time is split equally in innovating new models and methods, investigating their asymptotic properties, and creating software packages. I have one long-standing interdisciplinary collaboration with Professor Jennifer Groh of Duke Psychology and Neuroscience, and have recently started a new one in ecology with Professor Jim Clark in the Duke Nicholas School of Environment. Both these projects connect deeply with my core statistics. Below I have given brief summaries of some of my ongoing projects.

Regression Smoothing in High Dimension. This project addresses how to effectively carry out nonparametric regression smoothing in very high dimensional applications, where the number of predictors can be far greater than the sample size. While regression smoothing is typically employed to uncover nonlinear relationship between predictor and response, our primary motivation is to harness its ability to offer an honest and localized quantification of predictive uncertainty by taking into account how much training data is locally available to make prediction at a given point of interest. Such honest uncertainty quantification has far reaching implications in causal analysis of observational studies, and in designing of complex computer experiments.

Quantile Regression. Linear quantile regression is a simple generalization of ordinary least squares method, and is recognized as a powerful tool to analyze extremes, capture dependency beyond the average, and adjust for non-Gaussianity, and heavy tails. I have established a new inference framework for linear quantile regression that allows estimating the quantile regression lines at all quantile levels simultaneously by employing a loss-less, generative, probabilistic model for the data. My ongoing work focuses on methods and theory extension motivated by serious scientific applications of quantile regression where one needs to address the issue of dependency between observation units, as often seen with spatiotemporal data, longitudinal studies, and network indexed data.

Neuroscience: How the brain preserves information about multiple simultaneous items is poorly understood. In collaboration with Professor Groh, I am investigating whether neurons accomplish this using time division multiplexing, a telecommunications strategy for combining signals in a single channel. We have proposed a novel theory of “second order stochasticity” in analyzing neural data recorded under multiple stimuli exposure. Unlike the standard first order stochasticity models that assume neural data consist of uninformative noise around a fixed information encoding signal, our new theory postulates spontaneous, information-encoding, stochastic dynamical variations in a neuron’s response pattern that help it preserve information about each signal in the stimuli set.

Ecology: With Professor Clark, I have initiated a project applying quantile regression to ecology where we study species abundance as a response to climate, geography and geology, while accounting for the possibility of excess zero truncation due to lack of observation, and, possible heavy tails.