Support points - a new way to reduce big and high-dimensional data

Simon Tsz Fung Mak, Georgia Tech

Friday, January 18, 2019 - 3:30pm

This talk presents a new method for reducing big and high-dimensional data into a smaller dataset, called support points (SPs). In an era where data is plentiful but downstream analysis is oftentimes expensive, SPs can be used to tackle many big data challenges in statistics, engineering and machine learning. SPs have two key advantages over existing methods. First, SPs provide optimal and model-free reduction of big data for a broad range of downstream analyses. Second, SPs can be efficiently computed via parallelized difference-of-convex optimization; this allows us to reduce millions of data points to a representative dataset in mere seconds. SPs also enjoy appealing theoretical guarantees, including distributional convergence and improved reduction over random sampling and clustering-based methods. The effectiveness of SPs is then demonstrated in two real-world applications, the first for reducing long Markov Chain Monte Carlo (MCMC) chains for rocket engine design, and the second for data reduction in computationally intensive predictive modeling.

Seminars generally take place in 116 Old Chemistry Building on Fridays from 3:30 - 4:30 pm. For additional information contact: or phone 919-684-8029. Sorry, but we do not have reprints available. Please feel free to contact the authors by email for follow-up information, articles, etc. Reception following seminar in 203B Old Chemistry.

Old Chemistry 116

Location Info