Associate Professor - RetiredQueens College, CUNY
Bayesian Functional Data Analysis for Computer Model Validation
Functional data analysis (FDA)—inference on curves or functions—has wide application in statistics. An example of considerable recent interest arises when considering computer models of processes; the output of such models is a function over the space of inputs of the computer model. The output is functional data in many contexts, such as when the output is a function of time, a surface, etc. In this research, we develop or extend four Bayesian FDA approaches to computer model validation, tailored to interdisciplinary problems in engineering and the environment. The first approach we consider is a nonparametric Bayesian statistics approach, utilizing a separable Gaussian Stochastic Process as the prior distribution for functions. This is a natural choice for smooth functions. The methodology is applied to a thermal computer model challenge problem, proposed by the Sandia National Laboratory. Direct use of separable Gaussian stochastic processes is inadequate for irregular functions, and can be computationally infeasible for high dimensional functions. The approach developed for such functions consists of representing the function in the wavelet domain; reducing the number of nonzero coefficients by thresholding; modeling the nonzero coefficients as functions of the associated inputs, using the nonparametric Bayesian method; and reconstructing the functions (with confidence bands) in the original (time) domain. The third approach extends the second in terms of function representation. We represent the functions in the eigen-space whose basis elements are linear combinations of the wavelet basis elements. The number of nonzero coefficients is greatly reduced in this eigen-space, as consequently is the computational expense for the statistical inverse problem. This method is applied to computer modeling of vehicle suspension systems. The fourth approach models functions as multivariate Dynamic Linear Models. This approach is useful when the functions are highly variable and, as opposed to attempting to represent the functions exactly, one seeks primarily to capture relevant stochastic structure of the functions. The method has been tested with a simulated data set. In addition to the basic issue of functional data, all the above approaches must also contend with three other issues associated with computer model validation. First, emulators must typically be constructed for expensive-to-run computer models, by treating them as spatial processes defined on the input space. Second, computer model bias—the discrepancy between the computer model output and reality—must be taken into account. Third, the computer models typically have unknown parameters, requiring solution of an inverse problem in their estimation. Because these issues must all be addressed simultaneously and with limited data, extensive use is made of Markov Chain Monte Carlo (MCMC) algorithms. Some modular versions of MCMC are also introduced to reduce the confounding between some of the elements in the corresponding statistical models.