So in my thesis I set out to build a solution to this problem based on machine learning (in particular decision trees and support vector machines).
Here's the abstract:
Record linkage is a well-known task that attempts to link different representations
of the same entity, who happens to be duplicated inside a database; in particu-
lar, identity reconciliation is a subfield of record linkage that attempts to connect
multiple records belonging to the same person. This work faces the problem in
the context of online social networks, with the goal of linking profiles of different
This work evaluates several machine learning techniques where domain-specific
distances are employed (e.g. decision trees and support vector machines). In ad-
dition, we evaluate the influence of several post-processing techniques such as
breakup of large connected components and of users containing conflicting pro-
The evaluation has been performed on 2 datasets gathered from Facebook, Twitter
and LinkedIn, for a total of 34,000 profiles and 2200 real users having more than
one profile in the dataset. Precision and recall are in the range of cross-validated
90% depending on the model used, and decision trees are discovered as the most
The full thesis can be downloaded if you're interested into these sorts of things (namely applying machine learning to data coming from social network APIs).