Introduction to basic principles of analyzing relational data. Consider deterministic and probabilisitic specifications of networks and graphs, studying structural blockmodels, the Erdos-Renyi model, the exponential random graph model, the stochastic blockmodel, generalizations to latent space models and to more complex relational data. Development of these models and practical understanding of how to fit them. There is no book, lectures will be supplemented with discussions of relevant papers. Pre-requisites: STA 601 or 602. Co-registration in STA 532 or 732.
Introduction to basic principles of analyzing relational data. Consider deterministic and probabilistic specifications of networks and graphs, studying structural blockmodels, the Erdos-Renyi model, the exponential random graph model, the stochastic blockmodel, generalizations to latent space models and to more complex relational data. Development of these models and practical understanding of how to fit them. There is no book, lectures will be supplemented with discussions of relevant papers. Prerequisite: Statistical Science 360. Instructor: Volfovsky
The rapid growth of digitalized data and the computer power available to analyze it has created immense opportunities for both machine learning and data mining. This course introduces machine learning and data mining methods. Topics covered include information retrieval, clustering, classification, modern regression, cross validation, boosting and bagging. Course emphasizes selection of appropriate methods and justification of choice, use of programming for implementation of the method, and evaluation and effective communication of results in data analysis reports.
Investigation of study designs collecting data and their implications for statistical inference. Design and analysis of surveys of populations, including stratification, clustering, multi-stage sampling, design-based inference, considerations when analyzing convenience samples and big data. Design and analysis of causal studies including randomized experiments, blocking, fractional factorial designs, non-randomized studies, propensity score analysis. Applications involving big data, health, policy, natural and social sciences.
Intro to data science and statistical thinking. Learn to explore, visualize, and analyze data to understand natural phenomena, investigate patterns, model outcomes, and make predictions, and do so in a reproducible and shareable manner. Gain experience in data wrangling and munging, exploratory data analysis, predictive modeling, and data visualization, and effective communication of results. Work on problems and case studies inspired by and based on real-world questions and data.
Statistical modeling and machine learning involving large data sets and challenging computation. Data pipelines and data bases, big data tools, sequential algorithms and subsampling methods for massive data sets, efficient programming for multi-core and cluster machines, including topics drawn from GPU programming, cloud computing, Map/Reduce and general tools of distributed computing environments. Intense use of statistical and data manipulation software will be required. Data from areas such as astronomy, genomics, finance, social media, networks, neuroscience.
Introduction to Bayesian modeling for data with spatial and/or time dependence. Exploratory analysis of spatial (point referenced and areal) and time series data. Gaussian processes and generalizations. Extending hierarchical Bayesian linear models and generalized linear models. Spatial models: CAR, SAR, kriging and time series models: ARM, ARMA, dynamic linear models. Computational methods for model fitting and diagnostics. Prerequisite: STA360, STA601 or equivalent. One course.
Classical and Bayesian design notions and techniques—experimental units, randomization, treatments, blocking and restrictions to randomization, and utility of designs. Optimal sample size determination for estimation and testing. Factorial and fractional factorial designs, response surface methods, conjoint designs, sequential designs and bandit problems used in on-line advertising. Design and modeling of complex computer experiments. Designs for multiple objectives. Computational algorithms for finding optimal designs. Prerequisites: STA531, STA532, STA523L. One course / 3 units.
Statistical models for modeling, monitoring, assessing and forecasting time series. Univariate and multivariate dynamic models; state space modeling approaches; Bayesian inference and prediction; computational methods for fast data analysis, learning and prediction; time series decomposition; dynamic model and time series structure assessment. Routine use of statistical software for time series applications. Applied studies motivated by problems and time series data from a range of applied fields including economics, finance, neuroscience, climatology, social networks, and others.
Nonparametric Bayesian models and methods for complex data analyses with non-linearity adjustment, flexible borrowing of information, local uncertainty quantification and interaction discovery. Focuses oncomputationally and theoretically efficient nonparametric regression techniques based on advanced Gaussian process models, with motivating applications in causal inference and big data genomics. Includes several illustrative examples with R codes. Basic coverage of asymptotic theory and MCMC and greedy algorithms. Prerequisites: STA531, STA532, STA523L. One course / 3 units.