Statistical models are constructed for a variety of purposes, but typically involve an effort to explain observables (existing or future data) in terms of some underlying structure. Such models are rarely (never?) a perfect explanation of the observables, so that consideration of model uncertainty is a crucial part of statistics.
Model uncertainty is also one of the most challenging areas of statistics, for several reasons. First, the number of possible models under consideration may be overwhelming. Take ordinary linear regression for instance, where we seek to explain observed data by linearly relating the data to possible covariates. Suppose we have 60 possible explanatory covariates, but we are unsure which should be included in the regression model and which should be omitted. Then the number of possible regression models under consideration is 260, which is even too large to enumerate.
A second major challenge is that, in today’s “big data” world, the number of possible explanatory covariates could greatly exceed the amount of available data; this occurs for, instance, if the explanatory covariates are genes.
Other challenges arise in trying to implement strategies for dealing with model uncertainty. For instance, one of the most promising strategies is called ‘Bayesian model averaging,’ but this requires specification of prior distributions for, say, 260 models.
These challenges (and others) make the handling of model uncertainty one of the most interesting research problems in statistics.