Gaussian Process-Based Models for Clinical Time Series in Healthcare
Clinical prediction models offer the ability to help physicians make better data-driven decisions that can improve patient outcomes. Given the wealth of data available with the widespread adoption of electronic health records, more flexible statistical models are required that can account for the messiness and complexity of this data. In this dissertation we focus on developing models for clinical time series, as most data within healthcare is collected longitudinally and it is important to take this structure into account. Models built off of Gaussian processes are natural in this setting of irregularly sampled, noisy time series with many missing values. In addition, they have the added benefit of accounting for and quantifying uncertainty, which can be extremely useful in medical decision making. In this dissertation, we develop new Gaussian process-based models for medical time series along with associated algorithms for efficient inference on large-scale electronic health records data. We apply these models to several real healthcare applications, using local data obtained from the Duke University healthcare system. In Chapter 1 we give a brief overview of clinical prediction models, electronic health records, and Gaussian processes. In Chapter 2, we develop several Gaussian process models for clinical time series in the context of chronic kidney disease management. We show how our proposed joint model for longitudinal and time-to-event data and model for multivariate time series can make accurate predictions about a patient's future disease trajectory. In Chapter 3, we combine multi-output Gaussian processes with a downstream black-box deep recurrent neural network model from deep learning. We apply this modeling framework to clinical time series to improve early detection of sepsis among patients in the hospital, and show that the Gaussian process preprocessing layer both allows for uncertainty quantification and acts as a form of data augmentation to reduce overfitting. In Chapter 4, we again use multi-output Gaussian processes as a preprocessing layer in model-free deep reinforcement learning. Here the goal is to learn optimal treatments for sepsis given clinical time series and historical treatment decisions taken by clinicians, and we show that the Gaussian process preprocessing layer and use of a recurrent architecture offers improvements over standard deep reinforcement learning methods. We conclude in Chapter 5 with a summary of future areas for work, and a discussion on practical considerations and challenges involved in deploying machine learning models into actual clinical practice.
I'm a 5th year Ph.D. student in Statistical Science at Duke advised by Katherine Heller and funded on an NDSEG fellowship. I work on developing new (mostly Bayesian) machine learning methods to solve real clinical problems in the Duke healthcare system. In the past I've worked on models to predict surgical complications, and on time series and event-time models to predict disease progression and adverse events in patients with chronic kidney disease. Recently, I've been working on a deep learning model that we're deploying in Duke Hospital to improve early detection of sepsis. Before Duke, I obtained my AB in mathematics at Dartmouth College working with Dan Rockmore on time-varying topic models and on scalable inference in Bayesian network models.
Futoma, J, Sendak, M, Cameron, CB, and Heller, K. "Scalable joint modeling of longitudinal and point process data for disease trajectory prediction and improving management of chronic kidney disease." 32nd Conference on Uncertainty in Artificial Intelligence 2016, UAI 2016 (January 1, 2016): 222-231.