In the Ted Talk below, Computer Scientist Jennifer Golbeck tells the story of how ones Facebook Likes show strong correlations to seemingly unrelated personal attributes – the often cited example is how Liking the page “Curly Fries” is a strong indicator of high IQ, or Liking “That Spider is More Scared Than U Are” correlates to being a non-smokers.

Golbeck’s comments in the TedTalk were based on a study conducted by the University of Cambridge last year, where researchers studied a dataset of 58,000 U.S. Facebook users and created statistical models able to predict personal details using Facebook Likes alone.

“Models proved 88% accurate for determining male sexuality, 95% accurate distinguishing African-American from Caucasian American and 85% accurate differentiating Republican from Democrat. Christians and Muslims were correctly classified in 82% of cases, and good prediction accuracy was achieved for relationship status and substance abuse – between 65 and 73%.”

Last week, Golbeck took part in a Reddit AMA discussion where she spoke and gave advice to Reddit users on data protection. Below are three comments we have selected:

1) How to increase privacy:

the best thing you can do is crank up your privacy settings, be careful about what you share (don’t assume those privacy settings are iron clad), and delete old stuff that you’ve posted liberally and frequently. None of this is surefire protection – content is archived, people make copies, privacy settings aren’t perfect, etc – but these measures will make it a lot harder for people to track down potentially negative information to use against you.”

I also keep my social media pretty carefully limited. My facebook page only has my most recent 3 or 4 weeks’ worth of activity. I deleted everything older than that, and go through around once a week and delete all the things more than 3 weeks old (all my likes, comments, posts, etc). That limits what can be inferred about me from my profile.”

For those interested in deleting their tweets, Likes and Posts every three to four weeks, Golberg gave a link to a Slate article that outlines how to do this.

2) Is there a “creepy line” when it comes to using online data?

I think each person’s creepy line is different. I consider a lot of the stuff we can do – guessing who you will vote for, identifying your personality traits, etc – as kind of creepy because it can discover information you very explicitly try to keep private…tell the story in my TEDx talk linked above that liking the Facebook page for Curly Fries was shown to be one of the top predictors of high intelligence…That don’t make a lot of sense, which means it can be very hard for an individual to prevent these algorithms from learning things about them.

3) What could be done with this data?

“Ads are the place where there seems to be money in this now. However, I often (half) joke that if I get bored with this job, I would start a company that aggregates a lot of information about people, makes inferences over it (inferring things like commitment to your job, how well you work with others, how much of a procrastinator you are, etc.) and sell that report to businesses like your credit report gets sold. I think there is a lot of opportunity to make money off this data, but we are just starting to see this happen.”

To read the full Reddit discussion, look here

(Image Credit: Sean MacEntee)


Previous post

New York's 311: Greater Efficiency Through Machine Learning

Next post

Predicting the World Cup with Big Data