Missing Data

Missing data plague many applied data analyses. Many data analysts simply drop cases with missing values, but this can cause problems for inferences. In our department, several faculty are developing novel theory and methods for handling missing data. A major thrust is to extend the theory and applications of multiple imputation, a technique conceived initially as a tool that statistical agencies could use to handle nonresponse in large datasets disseminated to the public. The basic idea is for the statistical agency to simulate values for the missing data repeatedly by sampling from predictive distributions of the missing values. This creates multiple, completed datasets that are disseminated to the public. Faculty are developing better ways to impute missing data in high dimensional datasets, to handle data that are systematically missing (so called not missing at random data), and to improve theory for inferences. Faculty also work on principled, model-based methods for dealing with data reported with errors, such as someone who reports being a pregnant male. Application areas include government statistics, health data, and large scale surveys and censuses in political science, economics, psychology, and sociology.

Faculty in this Research Area