Software EngineerLinkedInJun 2017-Present
Bayesian Dynamic Modeling for Streaming Network Data
Abstract: Streaming network data of various forms arises in many applications, raising interest in research to model and quantify the nature of stochasticity and structure in dynamics underlying such data. One example context is that of traffic flow count data in networks, such as in automobile or aviation transportation, certain directed social network contexts, and Internet studies. Using an example of Internet browser traffic flows through site-segments of an international news website, I present Bayesian analyses of two new, linked classes of models which, in tandem, allow fast, scalable and interpretable Bayesian inference on dynamic patterns over time underlying flows. I develop two kinds of flexible state-space models for streaming count data, able to adaptively characterize and quantify network dynamics efficiently in real-time. These models are then used as emulators of more structured, time-varying gravity models that allow formal dissection of network dynamics. This yields interpretable inferences on traffic flow characteristics, and on dynamics in interactions among network nodes. Bayesian monitoring theory defines a strategy for sequential model assessment and adaptation in cases when network flow data deviates from model-based predictions. Exploratory and sequential monitoring analyses of evolving traffic on a network of web site-segments in e-commerce demonstrate the utility of this coupled Bayesian emulation approach to analysis of streaming network count data. A second, different dynamic network context is that involving relational data. Examples include contexts of binary network data indicating communications or relationships between pairs of network nodes over time. Some popular examples include friendships over social networks and communications between different functional zones in brain. Using an example of co-movements of company stock indices, I develop and compare two different approaches. One involves latent threshold models mapping latent processes to binary entries via a probabilistic link function, a second involves dynamic generalized linear models for binary outcomes. Analyses implement using Markov chain Monte Carlo methods are available for these models, but naturally computationally demanding and not scalable to relevant network dimensions for many contexts. In contrast, dynamic generalized linear models can implemented using fast, effective approximate Bayesian computations for both sequential and retrospective analyses to enable linear-time computations. I also demonstrate the use of a model decoupling/recoupling strategy to enable scaling in network size.