Statistical Data Mining
Introduction to data mining, including multivariate nonparametric regression, classification, and cluster analysis. Topics include the curse of dimensionality, the bootstrap, cross-validation, search (especially model selection), smoothing, the backfitting algorithm, and boosting. Emphasis on regression methods (e.g., neural networks, wavelets, the LASSO, and LARS), classifications methods (e.g., CART, Support vector machines, and nearest-neighbor methods), and cluster analysis (e.g., self-organizing maps, D-means clustering, and minimum spanning trees). Theory illustrated through analysis of classical data sets. Prerequisite: Statistical Science 250.