Introduction to the mathematics and algorithms that are central to a variety of data science applications. Basic mathematical concepts underlying popular data science algorithms will be introduced and students will write code implementing these algorithms. We will discuss the impact of these algorithms on society and ethical implications. Algorithms examined include: Google's pagerank, principal component analysis for visualizing high dimensional data, hidden Markov models for speech recognition, and classifiers detecting spam emails.
(No longer offered but listed for historical reasons.)
Statistical concepts involved in making inferences, decisions, and predictions from data. Emphasis on applications, not formal technique. Prerequisite: Must have taken placement test and placed in Statistics 30. See website for placement info. Director of undergraduate studies consent required. Not open to students with Statistics AP credit, Math AP credit, or credit for Math105L or higher.
This is an introductory overview course at an advanced level. Covers standard techniques, such as the perceptron algorithm, decision trees, random forests, boosting, support vector machines and reproducing kernel Hilbert spaces, regression, K-means, Gaussian mixture models and EM, neural networks, and multi-armed bandits. Covers introductory statistical learning theory. Recommended prerequisite: linear algebra, probability, analysis or equivalent.
Geometry of high dimensional data sets. Linear dimension reduction, principal component analysis, kernel methods. Nonlinear dimension reduction, manifold models. Graphs. Random walks on graphs, diffusions, page rank. Clustering, classification and regression in high-dimensions. Sparsity. Computational aspects, randomized algorithms. Prerequisite: Mathematics 218 or 221.
Variance component models with fixed and random effects. Multilevel and hierarchical models for longitudinal and/or clustered data. Focus on model fitting and interpretation. Maximum likelihood and Bayesian inference and computation. Prerequisite: STA 360, 601, or 602 and R programming skills. Not open to students with credit for STA 410.
Variance component models with fixed and random effects. Multilevel and hierarchical models for longitudinal and/or clustered data. Focus on model fitting and interpretation. Maximum likelihood and Bayesian inference and computation. Prerequisite: Statistical Science 360. Recommended prerequisite: R programming skills.
Probability models, random variables with discrete and continuous distributions. Marginal, joint, and conditional distributions. Expectations, functions of random variables, central limit theorem. Estimators and sampling distributions, method of moments, and maximum likelihood estimation. Prerequisite: Mathematics 22, 112L, 122, 122L, 202D, 212, 222, or graduate-student standing. Not open to students who have taken Statistical Science 230/Mathematics 230 or Mathematics 340/Statistical Science 231.
Introduction to basic principles of analyzing relational data. Consider deterministic and probabilistic specifications of networks and graphs, studying structural blockmodels, the Erdos-Renyi model, the exponential random graph model, the stochastic blockmodel, generalizations to latent space models and to more complex relational data. Development of these models and practical understanding of how to fit them. There is no book, lectures will be supplemented with discussions of relevant papers. Prerequisite: Statistical Science 601 or 602L. Corequisite: Statistical Science 532 or 732.