Variance component models with fixed and random effects. Multilevel and hierarchical models for longitudinal and/or clustered data. Focus on model fitting and interpretation. Maximum likelihood and Bayesian inference and computation. Prerequisite: STA 360 and R programming skills. QS.
Probability models, random variables with discrete and continuous distributions. Marginal, joint, and conditional distributions. Expectations, functions of random variables, central limit theorem. Estimators and sampling distributions, method of moments, and maximum likelihood estimation. Prerequisite: Mathematics 22, 112L, 122, 122L, 202, 212, 222, or graduate student. Not open to students who have taken Statistical Science 230/Mathematics 230 or Mathematics 340.
Introduction to basic principles of analyzing relational data. Consider deterministic and probabilisitic specifications of networks and graphs, studying structural blockmodels, the Erdos-Renyi model, the exponential random graph model, the stochastic blockmodel, generalizations to latent space models and to more complex relational data. Development of these models and practical understanding of how to fit them. There is no book, lectures will be supplemented with discussions of relevant papers. Pre-requisites: STA 601 or 602. Co-registration in STA 532 or 732.
Introduction to basic principles of analyzing relational data. Consider deterministic and probabilistic specifications of networks and graphs, studying structural blockmodels, the Erdos-Renyi model, the exponential random graph model, the stochastic blockmodel, generalizations to latent space models and to more complex relational data. Development of these models and practical understanding of how to fit them. There is no book, lectures will be supplemented with discussions of relevant papers. Prerequisite: Statistical Science 360. Instructor: Volfovsky
The rapid growth of digitalized data and the computer power available to analyze it has created immense opportunities for both machine learning and data mining. This course introduces machine learning and data mining methods. Topics covered include information retrieval, clustering, classification, modern regression, cross validation, boosting and bagging. Course emphasizes selection of appropriate methods and justification of choice, use of programming for implementation of the method, and evaluation and effective communication of results in data analysis reports.
Investigation of study designs collecting data and their implications for statistical inference. Design and analysis of surveys of populations, including stratification, clustering, multi-stage sampling, design-based inference, considerations when analyzing convenience samples and big data. Design and analysis of causal studies including randomized experiments, blocking, fractional factorial designs, non-randomized studies, propensity score analysis. Applications involving big data, health, policy, natural and social sciences.
Intro to data science and statistical thinking. Learn to explore, visualize, and analyze data to understand natural phenomena, investigate patterns, model outcomes, and make predictions, and do so in a reproducible and shareable manner. Gain experience in data wrangling and munging, exploratory data analysis, predictive modeling, and data visualization, and effective communication of results. Work on problems and case studies inspired by and based on real-world questions and data.
Statistical modeling and machine learning involving large data sets and challenging computation. Data pipelines and data bases, big data tools, sequential algorithms and subsampling methods for massive data sets, efficient programming for multi-core and cluster machines, including topics drawn from GPU programming, cloud computing, Map/Reduce and general tools of distributed computing environments. Intense use of statistical and data manipulation software will be required. Data from areas such as astronomy, genomics, finance, social media, networks, neuroscience.
Introduction to Bayesian modeling for data with spatial and/or time dependence. Exploratory analysis of spatial (point referenced and areal) and time series data. Gaussian processes and generalizations. Extending hierarchical Bayesian linear models and generalized linear models. Spatial models: CAR, SAR, kriging and time series models: ARM, ARMA, dynamic linear models. Computational methods for model fitting and diagnostics. Prerequisite: STA360, STA601 or equivalent. One course.
Classical and Bayesian design notions and techniques—experimental units, randomization, treatments, blocking and restrictions to randomization, and utility of designs. Optimal sample size determination for estimation and testing. Factorial and fractional factorial designs, response surface methods, conjoint designs, sequential designs and bandit problems used in on-line advertising. Design and modeling of complex computer experiments. Designs for multiple objectives. Computational algorithms for finding optimal designs. Prerequisites: STA531, STA532, STA523L. One course / 3 units.