Two StatSci Teams are Winners of the LinkedIn Economic Graph Challenge

June 15, 2015

Two of the 11 teams that won the innovative and highly competitive LinkedIn Economic Graph Challenge are led by Duke Statistics faculty David Banks, Katherine Heller, and David Dunson. In addition, incoming Assistant Professor Alex Volvofsky is part of the winning team from Havard's Department of Statistics. Launched in October 2014, the challenge asked investigators, academicians and “data-driven thinkers” to submit their ideas for creating greater economic opportunity for the approximately 3 billion people in the global workforce.

LinkedIn will work with the winning teams to implement their ideas to create the world's first “economic graph,” aimed at more effectively connecting job seekers and employers. The winning teams were announced on May 11, and they already have met with LinkedIn staff at the company’s Mountain View, Calif., headquarters.

Here are summaries of the proposals submitted by Duke’s two winning teams:

“Text mining on dynamic graphs” – Optimizing employer-job seeker connections

David L. Banks, PhD, Professor of the Practice, Department of Statistical Science
Katherine A. Heller, PhD, Assistant Professor, Department of Statistical Science
Sayan Patra, PhD student, Department of Statistical Science

Our goal is to invent new information technology that improves how LinkedIn members are matched with job openings and to advise companies on which skill sets best match their needs. We propose developing new text models that analyze member profiles and job listings, utilizing network structure to discover relevant content. The new models use cutting-edge machine-learning methods to predict changes to both text content and the network dynamics.

“In dynamic text networks like LinkedIn, every article is text and there are hyperlinks between articles that carry users from one topic to a related topic,” explains team leader David Banks. Other examples of dynamic text networks include Wikipedia, citation networks such as PubMed, political-blog networks, and the Internet itself.

“The network structure should improve the topic discovery and similarly, if two documents are talking about the same topic, then there’s a higher probability that there will eventually be an ‘edge’ between those two documents,” Banks says. “Our team wants to build a model for LinkedIn that includes networks and text mining, so that topic discovery is improved by the network and network prediction is improved by the topic discovery.”

There are a number of ways in which this dynamic network could evolve and use information, he says. A job seeker may, for example, move to a city where an employer needs someone with his or her skill set – or earn a degree that makes him or her employable in a new way.

New and expanding companies also require an evolving network, says Banks. For instance, “if a computer-science company is opening a new branch in Boston, the first set of hires will have to be computer scientists, the next set will be administrative staff, and so on – so there’s the concept of staged growth. And in principle, LinkedIn can help that company by providing it with stage-appropriate candidate pools.”

“The idea is for us to help LinkedIn more effectively use text mining and dynamic-network modeling to facilitate connections between job-seekers and companies,” he says.

“Find and change your position in a virtual professional world” – Creating user road maps

David B. Dunson, PhD, Distinguished Professor, Department of Statistical Science
Joseph D. Futoma, PhD student, Department of Statistical Science
Yan Shang, PhD student (Operations Management), Fuqua School of Business

Our goal is to use relational information from the LinkedIn network to increase transparency and efficiency of both job searching and recruiting. We propose determining the relative positions of LinkedIn members in a virtual professional world. Each LinkedIn member is represented by a point in space. Closeness between members measures professional similarity. An institute/company/job can be represented by a data cluster of individual members, capturing complexity and heterogeneity.

“The focus of our team is to use the massive amount of LinkedIn data to help job seekers determine how to get where they want to be professionally, as well as to help employers recruit the best talent for them,” says team leader David Dunson. “We’d like to create a road map that helps LinkedIn users travel optimally through the LinkedIn network toward a particular destination.”

The company currently doesn’t have a tool like the one Dunson’s team hopes to develop, which would utilize latent space representation.

“LinkedIn is a massive database of tens of millions of people who have different trajectories and different data sets; they grew up in particular areas, went to particular schools, have particular experience, and are looking for particular things,” Dunson says. “Because latent space representation methods can reduce the enormous network dimension, they’re among the most successful in modeling network data. The goal is to create an economic graph that tells LinkedIn users what their possible paths are to achieve a particular outcome – whether that’s finding a job or recruiting the right person for a job.”

For example, he says, say a young person dreams of being an engineer at Tesla. “There’s going to be a lot of very complicated and high-dimensional information in the data set about that, but by using one of these latent space methods, we can simplify things in terms of how that person can potentially move from where he or she is now to getting that job at Tesla,” says Dunson. “Latent space representation can allow LinkedIn users to explore different trajectories of people in similar circumstances, which can help inform the decisions users make in terms of things like choosing colleges, majors, internships, and jobs.”

The LinkedIn Economic Graph Challenge project is slated to wrap up in late 2015.

Winners of the LinkedIn Economic Graph Challenge