This course will extend the foundation laid in software tools for data science to allow for efficient computing involving very large data sets. This course will explore the use appropriate algorithms and data structures for intensive computations, improving computational performance by use of native code compilation, use of parallel computing to accelerate intensive computations, use appropriate algorithms and data structures for massive data set, and use of distributed computing to process massive data sets. Prerequisite: Biostatistics 821 or permission of the director of graduate studies.