Statistical Data Mining


Introduction to data mining, including multivariate nonparametric regression, classification, and cluster analysis. Topics include the curse of dimensionality, the bootstrap, cross-validation, search (especially model selection), smoothing, the backfitting algorithm, and boosting. Emphasis on regression methods (e.g., neural networks, wavelets, the LASSO, and LARS), classifications methods (e.g., CART, Support vector machines, and nearest-neighbor methods), and cluster analysis (e.g., self-organizing maps, D-means clustering, and minimum spanning trees). Theory illustrated through analysis of classical data sets. Prerequisite: Statistical Science 250.

Crosslisting Numbers: 


Curriculum Codes: