DataFest 2021 at Duke was one of over thirty virtual ASA DataFest events that took place in Spring 2021. DataFest is a data analysis competition sponsored by the American Statistical Association; students work in teams to analyze a surprise and complex data set over a weekend. This year, Duke’s participants included over 250 students representing 12 colleges and universities.
The surprise data set from Rocky Mountain Poison & Drug Safety included responses from the United States, Canada, and Germany about drug use and misuse. DataFest participants were challenged with identifying patterns of drug use and misuse from the data as well as suggesting questions doctors could use to identify patients who are at high risk for illegal drug use and addiction.
Participants had the opportunity for mentoring by consultants who were graduate students and professionals from academia and industry in statistics, data science, and related fields. Emily Hadley, a data scientist at RTI International and Duke Statistical Science alum, was the featured speaker in a lunch event on Saturday. Events also included yoga led by David Roberts from Duke Recreation and Physical Education and meditation led by Lindsey Parker from DuWell.
At the end of the weekend, teams submitted video presentations that were judged by 20 academic and industry professionals in statistics, data science, health insurance, and related fields. The following projects were recognized as this year’s winners:
You can view the winning presentations on the DataFest website.
A few of the winners shared their motivation to participate in DataFest and advice for future participants. For Jayesh Gupta, a rising junior at Duke majoring in Electrical & Computer Engineering and Computer Science, experience participating in previous competitions and encouragement from a friend motivated him to participate in this year’s DataFest. “I first heard of DataFest when my friend Bhrij approached me and asked if I was interested in competing with him...I enjoyed the experience of my first datathon as I love groupwork and loved the time crunch as it made me more efficient in my time management and my programming.” Albert Sun, a rising junior at Duke majoring in Math and Computer Science with a minor in Statistical Science, had a similar motivation for participating, “I participated in Duke DataFest last year, and I had a great time. I ended up participating again when Bhrij [a fellow teammate] reached out to create a team in the Duke Applied Machine Learning Slack.”
Others mentioned the opportunity to apply their data analysis skills as their primary motivation. “This competition allowed me to test my skills in data analysis in a fast-paced technical environment”, said Brian Wang, a recent graduate from Duke’s MS in Quantitative Management program. Similarly, Siddharth Bowgal, a rising senior at the University of North Carolina majoring in Statistics & Analytics and Economics said, “I have participated in DataFest before during my freshman year when I didn’t have much experience with data analysis, so I saw this year’s competition as an opportunity to apply new programming skills to see what I could discover.”
The teams took a variety of approaches to analyze the complex data for this year’s challenge, but one thing they had in common was narrowing the focus of their analyses. This general approach was nicely summed up by Jayesh Gupta who said about his group’s analysis approach “I think when our team first saw the dataset we were really excited about some of the analysis we could do. When deliberating what we could do, we knew that we needed to be focused as there is a lot of data and we needed to be very specific in what we wanted to do in order to achieve good analysis.”
As we look forward to the next DataFest in Spring 2022, the winners offered advice to students participating in future DataFest events (and really any data analysis competition):