Sunday, October 28, 2012

My thesis: linking social network profiles

Each of us has several accounts on multiple social networks, such as Facebook, Twitter and LinkedIn. But there's currently no deterministic way to find the LinkedIn profile of a Facebook user in an automated way: you have to google the full name of that person and verify the search results by hand.
So in my thesis I set out to build a solution to this problem based on machine learning (in particular decision trees and support vector machines).

Here's the abstract:

Record linkage is a well-known task that attempts to link different representations
of the same entity, who happens to be duplicated inside a database; in particu-
lar, identity reconciliation is a subfield of record linkage that attempts to connect
multiple records belonging to the same person. This work faces the problem in
the context of online social networks, with the goal of linking profiles of different
online platforms.
This work evaluates several machine learning techniques where domain-specific
distances are employed (e.g. decision trees and support vector machines). In ad-
dition, we evaluate the influence of several post-processing techniques such as
breakup of large connected components and of users containing conflicting pro-
The evaluation has been performed on 2 datasets gathered from Facebook, Twitter
and LinkedIn, for a total of 34,000 profiles and 2200 real users having more than
one profile in the dataset. Precision and recall are in the range of cross-validated
90% depending on the model used, and decision trees are discovered as the most
accurate classifier.
The full thesis can be downloaded if you're interested into these sorts of things (namely applying machine learning to data coming from social network APIs).

No comments: