Statistical modeling and machine learning involving large data sets and challenging computation. Data pipelines and data bases, big data tools, sequential algorithms and subsampling methods for massive data sets, efficient programming for multi-core and cluster machines, including topics drawn from GPU programming, cloud computing, Map/Reduce and general tools of distributed computing environments. Intense use of statistical and data manipulation software will be required. Data from areas such as astronomy, genomics, finance, social media, networks, neuroscience.
Introduction to Bayesian modeling for data with spatial and/or time dependence. Exploratory analysis of spatial (point referenced and areal) and time series data. Gaussian processes and generalizations. Extending hierarchical Bayesian linear models and generalized linear models. Spatial models: CAR, SAR, kriging and time series models: ARM, ARMA, dynamic linear models. Computational methods for model fitting and diagnostics. Prerequisite: STA360, STA601 or equivalent. One course.
Classical and Bayesian design notions and techniques—experimental units, randomization, treatments, blocking and restrictions to randomization, and utility of designs. Optimal sample size determination for estimation and testing. Factorial and fractional factorial designs, response surface methods, conjoint designs, sequential designs and bandit problems used in on-line advertising. Design and modeling of complex computer experiments. Designs for multiple objectives. Computational algorithms for finding optimal designs. Prerequisites: STA531, STA532, STA523L. One course / 3 units.
Statistical models for modeling, monitoring, assessing and forecasting time series. Univariate and multivariate dynamic models; state space modeling approaches; Bayesian inference and prediction; computational methods for fast data analysis, learning and prediction; time series decomposition; dynamic model and time series structure assessment. Routine use of statistical software for time series applications. Applied studies motivated by problems and time series data from a range of applied fields including economics, finance, neuroscience, climatology, social networks, and others.
Nonparametric Bayesian models and methods for complex data analyses with non-linearity adjustment, flexible borrowing of information, local uncertainty quantification and interaction discovery. Focuses oncomputationally and theoretically efficient nonparametric regression techniques based on advanced Gaussian process models, with motivating applications in causal inference and big data genomics. Includes several illustrative examples with R codes. Basic coverage of asymptotic theory and MCMC and greedy algorithms. Prerequisites: STA531, STA532, STA523L. One course / 3 units.
Statistical issues in causality and methods for estimating causal effects. Randomized designs and alternative designs and methods for when randomization is infeasible: matching methods, propensity scores, longitudinal treatments, regression discontinuity, instrumental variables, and principal stratification. Methods are motivated by examples from social sciences, policy and health sciences. Prerequisites: STA531, STA532, STA523L. One course / 3 units.
Formulation of decision problems; criteria for optimality: maximum expected utility and minimax. Axiomatic foundations of expected utility; coherence and the axioms of probability (the Dutch Book theorem). Elicitation of probabilities and utilities. The value of information. Estimation and hypothesis testing as decision problems: risk, sufficiency, completeness and admissibility. Stein estimation. Bayes decision functions and their properties. Minimax analysis and improper priors. Decision theoretic Bayesian experimental design. Combining evidence and group decisions.
Introduction to data mining, including multivariate nonparametric regression, classification, and cluster analysis. Topics include the curse of dimensionality, the bootstrap, cross-validation, search (especially model selection), smoothing, the backfitting algorithm, and boosting. Emphasis on regression methods (e.g., neural networks, wavelets, the LASSO, and LARS), classifications methods (e.g., CART, Support vector machines, and nearest-neighbor methods), and cluster analysis (e.g., self-organizing maps, D-means clustering, and minimum spanning trees).
One course / 3 units.
One course / 3 units.