Portfolio of Work

As part of the Completion Exercise, M.S. students that present and defend a Portfolio of Work must demonstrate mastery of statistical methods, application, and computation.

Portfolio topics can come from a mentored industrial internship, industry-sponsored capstone project, an applied course, or a research project supervised by Duke faculty.

The portfolio presentations are scheduled for early March from 2:00 to 4:00 pm. The date of the presentations will be announced by the beginning of the Spring semester. The presentations will be followed by a reception for all MSS graduate students (first- and second-year). 

Portfolio Contents

  • A portfolio title and an abstract must be submitted to the student’s portfolio advisor by December 15.
  • Each student will create a poster that they will present to a committee of three faculty members assigned by the department. The poster must include only one project. The topic of the poster can come from an internship, capstone, research, or course project.
  • Portfolio Report (2-3 pages) must be prepared describing the project, including a review of the problem and statistical methodology, discussion of results and conclusions, a summary of what was learned, and potential paths forward.
  • Students need to create a GitHub repository and include all portfolio material (poster, portfolio report, code [if available], and any non-proprietary documents or presentations), along with their curriculum vitae/resume. The link to the GitHub repository with all final portfolio material must be shared with the MS Director and committee members at least one week before the presentation.

All students doing a Portfolio of Work must follow the steps outlined in the Portfolio of Work Process, including meeting the deadlines in the Portfolio Checklist and completing regular check-ins with their portfolio advisor.

Portfolio Presentations

During the presentation, students will be evaluated by the faculty committee on: 

  • Achievement in core areas of statistical modeling, applied statistics, and statistical computing;
  • Achievement in defining the ability to address and solve real-world problems with relevant statistical and computational methods; and
  • Achievements in communicating in oral and written form with a professional audience.

Students completing the Portfolio of Work presentation must satisfy all of the above criteria at a Satisfactory or Excellent level. A student will otherwise receive written feedback on any aspects marked Unsatisfactory, including comments on recommended remedial paths.

MSS Portfolio Award

Each second-year MSS student completing an MSS portfolio will be eligible for the Master’s Portfolio Award. The purpose of the Portfolio Award is to encourage the development of data analysis skills, to enhance presentation skills, and to recognize outstanding work by Master's students The selection of the Award is made by the Statistical Science Portfolio Committees on the basis of the submitted portfolios and presentations. 

Selected Poster and Presentations:

See below for examples of posters and a presentation from previous years.  Note that prior to 2022, student portfolios included two projects instead of one.

Cole Juracek, MSS 2021 Graduate:

Download Poster (pdf - 698.29 KB)


Download Example Poster 1 (pdf - 857.01 KB)
Download Example Poster 2 (pdf - 508.15 KB)
  • A Comparison of Record Linkage Methods applied to Real and Synthetic Data
  • Classification Models applied to Airbnb Listings in Asheville, North Carolina
  • Hidden Markov Models for Part of Speech Tagging
  • GDP Forecasting Using MIDAS and LSTM models with Macroeconomic Indicators
  • Modern Classification Approaches applied to In-vehicle Coupon Recommendations
  • Airbnb Availability Prediction with Machine Learning
  • Estimating North Pacific Right Whale Population Density using Machine Learning Methods
  • Modern Statistical Machine Learning Methods Applied to Airbnb
  • Survival Dynamic Generalized Linear Model in Private Market Funding
  • Predicting NBA Draft Prospect Value Using LTR
  • Predicting Fantasy Football with Bayesian Hierarchical Model
  • Predicting Auto Loan Refinances using Machine Learning
  • Assessing Firm Success Based on Board Member Composition Via Hierarchical Modeling
  • Causal Analysis on the Right Heart Catheterization Data
  • Dynamic interventions for COVID-19
  • Classification on Coupon Recommendation Data
  • Estimation of Dynamic Treatment Regimes using Contextual Bandits with Hierarchical Surrogate Outcomes.
  • Predicting Coupon Acceptance Using Machine Learning Algorithms
  • Classifying Email Text Data Using Natural Language Processing and Machine Learning Techniques
  • Analysis of street price data on diverted pharmaceutical substances provided by StreetRX
  • Part-of-speech Tagging
  • Stack height estimation from satellite imagery with statistical models
  • Sentiment Analysis with Naive Bayes and Neural Network Classifiers
  • Digitization of healthcare diagnosis: a validation tool for practitioners to assess heart disease diagnoses


  • Using Gradient Boosting Machines to Build an Unconstrained Pure Premium Model
  • Text Classification for Conduct Surveillance and Price Prediction with Gradient Boosting Machines
  • Machine Learning in Pharmacodynamic Modeling of Anti-HIV Microbicide
  • Bayesian Hierarchical Approaches to Topic Modeling and Text Classification
  • Mixed Models to Investigate Sex Difference in Effects of Environmental Interaction on Cognitive Resilience
  • Cost Reduction Analysis with Pharmaceutical Insurance Claim Data and Prediction of Annual Influenza Vaccination Status
  • Text Classification of Active Directory Data with Long Short-term Memory Networks
  • Detecting Medical Insurance fraud with Ensemble clustering
  • Hierarchical Dirichlet Processes for Topic Modeling
  • Forecasting Models in Business Field - Applications in Real Estate and Ecommerce Short Text Classification and Financial Machine Learning
  • Models in Adult Income Prediction and Futures Hedging Strategy
  • Hyperparameter Tuning and Model Selection for Classification Problem
  • Applications of Machine Learning Methods for Classification
  • Highly Multiclass Text Classification in a Business Setting and Airbnb Listing Price Prediction
  • Application of Time-Varying Multivariate Models on Energy Consumption and Economic Data
  • Applied Signal Processing in Medical Device Development
  • Drivers of Course Rating and Models to Predict Ecommerce Sales
  • Multilabel Text Classification and Image Steganalysis
  • Multilevel Models Analysis and Optimization on Product Financial Data
  • Co-occurence Analysis on MIMIC Dataset
  • Clustering-Based Movie Recommendation System
  • Traffic Index Prediction and Word Embedding
  • Auto-Encoding Graph-Valued Data with Applications to Brain Connectomes and Recommender Systems
  • Applied Forecasting Models in Government Revenue Data
  • Identifying Significant Variables through Random Forest and Ridge Regression
  • Lorenz Interpolation: A Method for Estimating Income Statistics from Tabular Income Data
  • Identifying Musical Similarities Across Geographical Regions
  • Integrating Record Linkage and Propensity Score Matching
  • Spatio-Temporal Analysis of Gun Violence Victims and its Relation with Unemployment Rate in the USA
  • Nonlinear Regression and Network Inference for Neural Spike Count Data
  • Bayesian Item Response Modeling for Assessing State Interventions
  • Interpretable, Fair and Accurate Machine Learning for Criminal Recidivism
  • Developing a Clinical Decision Support Tool for Talaromycosis: A Case Study in Model Selection with Missing Data
  • Density Estimation with Mixture of Spherelets
  • Modified Leave-One-Out Cross-Validation for Linear Model Selection
  • Hierarchical Mixed Model for Influenza Outbreak Detection
  • Bayesian Hierarchical Model Evaluating Heart Surgical Program
  • Email Classification with Machine Learning
  • Hierarchical Modeling for Ranking Pediatric Heart Surgery Mortality
  • A Machine Learning Case Study from an Insurance Data Set
  • A Note of Hierarchical Incremental Gradient Descent on Riemannian Manifold
  • Web Attack Detection using Deep Learning
  • Generating Cartoon Characters with Style Generative Adversarial Network
  • A Statistical Model to Assess Hospitals Net Income and Rankings
  • Study of Hierarchical Model Applications on Amphetamines
  • Multivariate Linear Regression with Sparsity Estimators
  • Quantification of Cross-Shopping in E-commerce
  • Bayesian Diagnosis Model on Fever in Moshi, Tanzania
  • Analysis and Implementation of K-Mean++ with Parallel Initialization
  • Exploring Bayesian TIme-Series Models with Financial Data
  • Effect of Democratic Campaign Spending on 2018 House Midterms
  • A Two -stage Labeling Framework for Effective Text Classification 
  • Extensions of Predictive Models
  • Bayesian Applications in Time Series 
  • Applied Machine Learning: Classification and Regression Examples
  • Comparing the Performance of DID and LDV in Different Scenarios
  • An R-based Prediction Tool for Optimizing Forecast
  • Applications of Sampling and Clustering Methods
  • Phase Transitions in Linear Models and DID Causal Inference Analysis
  • Community Detection Thresholds in Heterogeneous Graphs
  • Using Biclustering Methods to Classify High Dimensional Data
  • The Application of TVAR Method on Financial Data
  • Approaches to Data Visualization and Prediction: Healthcare to Art
  • Application of Statistical Methods on Financial and Medical Data
  • Machine Learning Models in Health Care
  • Time Series Model in Inventory Optimization Management
  • Unsupervised Exploratory Analysis Tool for Biclustering
  • The Yelp Restaurant Recommendation System
  • Prediction of Default Risks with Statistical Models
  • Machine Learning Application in Video Game Outcome Prediction
  • Statistical Modeling and Insights in Financial Industry
  • Trends in Balloon Catheter Dilation of Paranasal Sinuses
  • Inferring Drug Innovation with Adverse Events 
  • Machine Learning Methods for Spatial and Financial Applications
  • Applied Bayesian Methods for Text Mining
  • Dynamic Factor Analysis in Internet Search Volume and Stock Volatility 
  • Comparing  Model-based Ranking Methods to Evaluate Physicians and Hospitals
  • Prediction of Medication Non-adherence with Clinical Notes
  • Evaluating Performance of Hospitals and Physicians using a Binomial Generalized Linear Mixed Model 
  • Text Analysis and Other Exploration
  • Deep Learning for the Automatic Grading of Diabetic Retinopathy 
  • Modeling Economic and Political Dynamics in the Middle East
  • Python Implementation of Bayesian Hierarchical Clustering
  • Implementation and Applications of Bayesian Hierarchical Clustering
  • Multi-Scale Topological Data Analysis to Identify Brain Fiber Connectivity for Biological Systems Applications
  • Bayesian Approach on Correcting Model Performance given Biased Estimates of Feature Values
  • Predicting Patient Admissions in the Medicare Shared Savings Program
  • Comparison of Machine Learning Methods in the Estimation of Housing Prices
  • Evaluating the Performance of a Generalized Recommendation Engine for the Financial Services Industry
  • Predictive Analytics in Healthcare and Medical Data Exploration
  • Establishing a Realistic Prior Model for Complex Geometrical Objects
  • Graph-Coupled HMMs and Deep Neural Network for Modeling Infection and Medical Diagnosis
  • Empirical Study of Topic Modeling in Movie Recommendation
  • Statistical Modeling and Traffic Violation Analysis
  • News' Predictive Power on St. Louis Fed Financial Stress Index
  • Application of Neural Networks with Joint Embedding for Medical Document Classification
  • Analysis and Implementation of Classification Algorithms (Kmeans + +, CONCOR)