When one is simultaneously conducting more than one test, this is called multiple testing. An example is when a study is done to find if certain genes are associated with certain diseases. Suppose 1,000,000 genes and 100 diseases are considered, with a test conducted to see if each gene-disease pair is associated. Then one is conducting 100,000,000 tests.
The issue with multiple testing is that one must adjust ordinary statistical practice to account for the multiple tests. For instance, it is common to simply reject a null hypothesis if the p-value is less than 0.05. But, if one were to test 100,000,000 null hypotheses that were all true, then the expected number of rejected null hypotheses would be 0.05 x 100,000,000 = 5,000,000, i.e., the scientist would be making 5,000,000 incorrect claims. Unfortunately, adjusting for multiple testing is often not done; the lack thereof is considered by many to be the leading cause for the current lack of reproducibility of much of science.
While some simple methods of adjusting for multiple testing are known, they are effective only in relatively simple cases, such as when all the tests being conducted are based on independent test statistics. Thus developing powerful multiple testing adjustments for the complex “big data” world we face is one of the current most interesting areas of statistics.