Mary-Russell Roberson for Trinity Communications
Lasse Vuursteen’s fascination with theory and statistics began with poker. “In my teens, I was playing poker semi-professionally and reading books about poker theory,” said Vuursteen, assistant professor in the Department of Statistical Science. Game theory led him to math, econometrics, and statistics.
As an adult, he still plays poker, but just for fun. Now his paycheck comes from developing statistical theories that help researchers and companies analyze data more efficiently and accurately.
“I use math to understand the laws of statistics,” he said. One of the ways he does that is by using math to prove that a particular statistical method is the best method — or not. “There are statistical problems for which we don’t have theory that tells us what works or what is optimal,” he said. “We don’t know if there is a better method out there.”
The answer might be yes, no, or it depends. “Sometimes you can find very interesting dynamics where for certain settings [a method] is optimal, but if you tweak the setting a little bit, it is no longer the optimal thing to do,” he said.
He also looks at maximizing statistical performance in the face of certain constraints on a dataset — for example, privacy constraints, when some of the data in the dataset has to be omitted or blurred (think health records or shopping habits) or bandwidth constraints, when there’s an overwhelming amount of data (think self-driving cars’ recorders or cellphone networks).
Privacy constraints are an increasingly pertinent issue. Regulations, such as the EU’s General Data Protection Regulation (GDPR), require more stringent data privacy protections. “A practitioner would like to know: what is the utility of this data if I impose privacy and how much privacy can I impose?” he said.
Applications could include targeted advertising or combining data pools of patients among different hospitals to create larger studies with more statistical power.
Vuursteen is looking forward to working on these issues with others at Duke in different disciplines. “There’s increasingly more fields that worry about privacy, so it offers a lot of opportunities for collaboration,” he said.
Some of the tools he uses in his work on privacy and bandwidth constraints can also be applied to transfer learning, which relates to whether results from one patient pool or one dataset can be generalized to another patient pool or dataset. “There is much overlap in terms of mathematical tools,” he said, “and advances in the study of privacy and communication constraints can carry over to answer questions about whether transfer learning is possible.”
Some transfer learning problems involve machine learning. “It depends on the setting and also who you talk to and how they define machine learning versus statistics,” he said. “I’m tempted to call things statistics that a computer scientist would call machine learning.”
Vuursteen, who hails from Groningen in the Netherlands, likes to run, cook and dance the tango. He is one of four faculty joining the Statistical Science department this fall. Read more about his new colleagues, Anya Katsevich, Sifan Liu and Omar Melikechi.