STA 701 Talk

Monday, November 25, 11:45 am - 1:00 pm

Speakers: Sam Rosen and Riccardo Rossetti

Moderator: Shuo Wang

Speaker: Sam Rosen

Title: New Finite-Sample Bounds in the Convex Clustering Regime

Abstract: Convex Clustering is an unsupervised learning method for grouping data by optimization of a specific convex objective function. Optimization of this function requires an affinity matrix (equivalently, graph) over the data points. In practice, this affinity matrix is sparse and dependent on the data; however, all existing theoretical error bounds in a variety of convex clustering settings assume the affinity matrix is the matrix of all ones. We show novel bounds that allow for not only arbitrary connected affinity graphs, but also affinities that depend on the input data. These bounds give justification for the affinity weights often used in practice and show a clear tradeoff between sparsity and accuracy of affinities. We show these bounds apply in the simple Convex Clustering setting, but also in more exotic settings such as Biconvex Biclustering.

Speaker: Riccardo Rossetti

Title: High-dimensional sparse linear regression under data incompleteness via approximate message passing

Abstract: Solving underdetermined systems of linear equations is a ubiquitous problem in many applications. To find solutions with desirable properties, a variety of different regularization approaches have been proposed. In particular, we focus on the classical L1-regularized least squares problem, widely known in the statistics literature as the Lasso. Among the many procedures to find solutions for the Lasso, a recursive algorithms known as approximate message passing (AMP) has recently received considerable attention as it provides improved convergence speed compared to classical iterative algorithms. Furthermore, its behavior can be accurately characterized in the high-dimensional limit via a simple recursion known as state evolution. Motivated by large-scale applications in which the entirety of the design matrix and the observed data may not be available for part or all of the algorithm's runtime, we introduce an extension of AMP, called Linear Operator AMP (OpAMP), which allows for the analysis of such partial computation settings. We prove rigorous theoretical guarantees for the dynamics of OpAMP on Gaussian designs and provide numerical evidence that performing computations with incomplete data allows for convergence of OpAMP to occur within a number of iterations that is on the same order of magnitude compared to the standard AMP recursion.

Old Chemistry 116

Contact

Lori Rauch

lori.rauch@duke.edu

STA 701 Talk

Old Chemistry 116

Contact

Statistical Science

Undergraduates

Graduate Students

Resources

Stay Connected