Francesca Petralia

missing portrait
External address: 
New York, NY
Graduation Year: 

Employment Info

Postdoctoral Fellow in Genetics & Genomics
Mount Sinai Hospital


Structured Bayesian Learning Through Mixture Models

In this thesis, we develop some Bayesian mixture density estimation for univariate and multivariate data. We start proposing a repulsive process favoring mixture components further apart. While conducting inferences on the cluster-specific parameters, current frequentist and Bayesian methods often encounter problems when clusters are placed too close together to be scientifically meaningful. Current Bayesian practice generates component-specific parameters independently from a common prior, which tends to favor similar components and often leads to substantial probability assigned to redundant components that are not needed to fit the data. As an alternative, we propose to generate components from a repulsive process, which leads to fewer, better separated and more interpretable clusters. In the second part of the thesis, we face the problem of modeling the conditional distribution of a response variable given a high dimensional vector of predictors potentially concentrated near a lower dimensional subspace or manifold. In many settings it is important to allow not only the mean but also the variance and shape of the response density to change flexibly with features, which are massive-dimensional. We propose a multiresolution model that scales efficiently to massive numbers of features, and can be implemented efficiently with slice sampling. In the third part of the thesis, we deal with the problem of characterizing the conditional density of a multivariate vector of response given a potentially high dimensional vector of predictors. The proposed model flexibly characterizes the density of the response variable by hierarchically coupling a collection of factor models, each one defined on a different scale of resolution. As it is illustrated in Chapter 4, our proposed method achieves good predictive performance compared to competitive models while efficiently scaling to high dimensional predictors.