Students apply statistical analysis skills to in-depth data analysis projects in a variety of areas of application. Students design and implement a data analysis plan based on substantive questions or hypotheses and communicate their results both technically and non-technically in oral presentations and written reports. Prerequisite: Statistical Science 360, 601, or 602. Not open to students who have taken Statistical Science 440 or Statistical Science 723.
Rigorous introduction to health data science using current applications in biomedical research, epidemiology, and health policy. Use modern statistical software to conduct reproducible data exploration, visualization, and analysis. Interpret and translate results for interdisciplinary researchers. Critically evaluate data-based claims, decisions, and policies. Includes exploratory data analysis, visualization, basics of probability and inference, predictive modeling and classification. This course focuses on the R computing language. No statistical or computing background is necessary.
Estimators and properties (efficiency, consistency, sufficiency); loss functions. Fisher information, asymptotic properties and distributions of estimators. Exponential families. Point and interval estimation, delta method. Neyman-Pearson lemma; likelihood ratio tests; multiple testing; design and the analysis of variance (ANOVA). High-dimensional data; statistical regularization and sparsity; penalty and prior formulations; model selection. Resampling methods; principal component analysis, mixture models.
Introduction to the mathematics and algorithms that are central to a variety of data science applications. Basic mathematical concepts underlying popular data science algorithms will be introduced and students will write code implementing these algorithms. We will discuss the impact of these algorithms on society and ethical implications. Algorithms examined include: Google's pagerank, principal component analysis for visualizing high dimensional data, hidden Markov models for speech recognition, and classifiers detecting spam emails.
(No longer offered but listed for historical reasons.)
Statistical concepts involved in making inferences, decisions, and predictions from data. Emphasis on applications, not formal technique. Prerequisite: Must have taken placement test and placed in Statistics 30. See website for placement info. Director of undergraduate studies consent required. Not open to students with Statistics AP credit, Math AP credit, or credit for Math105L or higher.
This is an introductory overview course at an advanced level. Covers standard techniques, such as the perceptron algorithm, decision trees, random forests, boosting, support vector machines and reproducing kernel Hilbert spaces, regression, K-means, Gaussian mixture models and EM, neural networks, and multi-armed bandits. Covers introductory statistical learning theory. Recommended prerequisite: linear algebra, probability, analysis or equivalent.
Geometry of high dimensional data sets. Linear dimension reduction, principal component analysis, kernel methods. Nonlinear dimension reduction, manifold models. Graphs. Random walks on graphs, diffusions, page rank. Clustering, classification and regression in high-dimensions. Sparsity. Computational aspects, randomized algorithms. Prerequisite: Mathematics 218 or 221.
Variance component models with fixed and random effects. Multilevel and hierarchical models for longitudinal and/or clustered data. Focus on model fitting and interpretation. Maximum likelihood and Bayesian inference and computation. Prerequisite: STA 360, 601, or 602 and R programming skills. Not open to students with credit for STA 410.