DataFest 2021 at Duke

DataFest  2021

DataFest 2021 at Duke was one of over thirty virtual ASA DataFest events that took place in Spring 2021. DataFest is a data analysis competition sponsored by the American Statistical Association;  students work in teams to analyze a surprise and complex data set over a weekend. This year, Duke’s participants included over 250 students representing 12 colleges and universities.

The surprise data set from Rocky Mountain Poison & Drug Safety included responses from the United States, Canada, and Germany about drug use and misuse. DataFest participants were challenged with identifying patterns of drug use and misuse from the data as well as suggesting questions doctors could use to identify patients who are at high risk for illegal drug use and addiction.

Participants had the opportunity for mentoring by consultants who were graduate students and professionals from academia and industry in statistics, data science, and related fields. Emily Hadley, a data scientist at RTI International and Duke Statistical Science alum, was the featured speaker in a lunch event on Saturday. Events also included yoga led by David Roberts from Duke Recreation and Physical Education and meditation led by Lindsey Parker from DuWell.

At the end of the weekend, teams submitted video presentations that were judged by 20 academic and industry professionals in statistics, data science, health insurance, and related fields. The following projects were recognized as this year’s winners:

  • Best Visualizations: Be Smart, Don't Start: A Look at the Problems Associated with Drug Misuse and Relevant Factors by Brian Wang, Yidan Wang, Eddie Zhang, Keren Zhang, Chloe Zhu
  • Best Insight: Survey of Non-Medical Use of Prescription Drugs Program: United States 19Q1 Launch by Siddharth Bowgal and Arjun Putcha
  • Judges’ Picks:
    • Drug Prediction Model Interpretability by Edwin Agnew, Bhrij Patel, Albert Sun, Jayesh Gupta
    • Drug Misuse Analysis by Bodong Xu, Jay Lin, Jiajun Wang, Ningyi Xue, Shan Xiang

You can view the winning presentations on the DataFest website.

A few of the winners shared their motivation to participate in DataFest and advice for future participants. For Jayesh Gupta, a rising junior at Duke majoring in Electrical & Computer Engineering and Computer Science, experience participating in previous competitions and encouragement from a friend motivated him to participate in this year’s DataFest. “I first heard of DataFest when my friend Bhrij approached me and asked if I was interested in competing with him...I enjoyed the experience of my first datathon as I love groupwork and loved the time crunch as it made me more efficient in my time management and my programming.” Albert Sun, a rising junior at Duke majoring in Math and Computer Science with a minor in Statistical Science, had a similar motivation for participating, “I participated in Duke DataFest last year, and I had a great time. I ended up participating again when Bhrij [a fellow teammate] reached out to create a team in the Duke Applied Machine Learning Slack.”

Others mentioned the opportunity to apply their data analysis skills as their primary motivation. “This competition allowed me to test my skills in data analysis in a fast-paced technical environment”, said Brian Wang, a recent graduate from Duke’s MS in Quantitative Management program. Similarly, Siddharth Bowgal, a rising senior at the University of North Carolina majoring in Statistics & Analytics and Economics said, “I have participated in DataFest before during my freshman year when I didn’t have much experience with data analysis, so I saw this year’s competition as an opportunity to apply new programming skills to see what I could discover.”

The teams took a variety of approaches to analyze the complex data for this year’s challenge, but one thing they had in common was narrowing the focus of their analyses. This general approach was nicely summed up by Jayesh Gupta who said about his group’s analysis approach “I think when our team first saw the dataset we were really excited about some of the analysis we could do. When deliberating what we could do, we knew that we needed to be focused as there is a lot of data and we needed to be very specific in what we wanted to do in order to achieve good analysis.”

As we look forward to the next DataFest in Spring 2022, the winners offered advice to students participating in future DataFest events (and really any data analysis competition):

  • “Don’t be intimidated by the data and just try your best! Even if you don’t have a lot of experience with statistical software, participating in DataFest can be a valuable experience where you can practice coding and communicating results and recommendations.” - Siddharth Bowgal
  • “Being focused and delegating tasks are two very important things to do. I think what really helped us is the fact that we knew that we wanted to focus on the opioid epidemic in the US and we ended up focusing on factors that have contributed to this. Also looking at previous winners’ code and slides was a helpful inspiration as well.” - Jayesh Gupta
  • “Read, read, read! Our team spent almost the entire first day reading different papers and articles about the opioid epidemic, prescription drug abuse, and applications of interpretable machine learning algorithms in medical settings. It really paid off when we began creating a story out of our data and methodology. I’d also encourage meeting with the consultants and ask them what type of insights and tools they might find most useful – they’re the experts!” - Albert Sun
  • “Come in positive and prepared. Regardless of the outcome, yourself and your team will become better data analysts after this competition.” - Brian Wang