Dataconomy
  • News
  • AI
  • Big Data
  • Machine Learning
  • Trends
    • Blockchain
    • Cybersecurity
    • FinTech
    • Gaming
    • Internet of Things
    • Startups
    • Whitepapers
  • Industry
    • Energy & Environment
    • Finance
    • Healthcare
    • Industrial Goods & Services
    • Marketing & Sales
    • Retail & Consumer
    • Technology & IT
    • Transportation & Logistics
  • Events
  • About
    • About Us
    • Contact
    • Imprint
    • Legal & Privacy
    • Newsletter
    • Partner With Us
    • Writers wanted
Subscribe
No Result
View All Result
Dataconomy
  • News
  • AI
  • Big Data
  • Machine Learning
  • Trends
    • Blockchain
    • Cybersecurity
    • FinTech
    • Gaming
    • Internet of Things
    • Startups
    • Whitepapers
  • Industry
    • Energy & Environment
    • Finance
    • Healthcare
    • Industrial Goods & Services
    • Marketing & Sales
    • Retail & Consumer
    • Technology & IT
    • Transportation & Logistics
  • Events
  • About
    • About Us
    • Contact
    • Imprint
    • Legal & Privacy
    • Newsletter
    • Partner With Us
    • Writers wanted
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

The Four Questions You Need To Ask To Get the Most Out of Your Data

by Dennis Clark
September 2, 2014
in Data Science
Home Topics Data Science
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail

Facebook is apparently flagging articles for satirical content. Of course, it can’t work: sarcasm is about cultural context and shared assumptions and a computer capable of understanding it would be very close to the almost-human AIs of science fiction.

But fielding requests for magic is an experience familiar to anybody working with data for less technical managers, and while Facebook’s current model doesn’t seem to be doing much more than identifying Onion articles (take http://nightofthelivingdad.net/2014/08/13/why-we-didnt-vaccinate-our-child/ for instance), we can certainly come up with something a little better than what they’re currently up to. By way of illustrating how a data project can go from idea to implementation, let’s walk through the questions we need to ask to figure out what we should be doing and why.

First, we can ask about the real need. Right now, what happens to the data? How could that be better? This can let us get to the point where we have a solvable problem. For Facebook, the goal is presumably to reduce the number of times a user gets confused by an article, thinking it’s real when it’s not. So we’d like to add to the data-processing pipeline something that can decide whether the content of a share is likely to be misleadingly satirical.

The next question is what level of success solves the need? Usually a solution doesn’t need to be perfect to be worth implementing — in some workflows in some companies, getting five minutes ahead of a developing event or routing requests 10% better can be worth millions; in others, you might need a model that was 99.9% accurate to be an improvement over actually having humans look at things. For Facebook, it’s hard to imagine a human-based solution that would be tenable, but we might want to think about the two different kinds of errors: people presumably don’t want their posts to be labeled as satire when they’re not, but a post that’s unlabeled is unlikely to drive people away in the short term, so it’s probably better to not label anything unless you have a fairly high degree of confidence. This kind of situation is pretty common: the exact kinds of errors that are acceptable constrain the available solutions.

What is the available data? One of the most common failure modes of proposed data projects is, remarkably, a lack of data. The optimal case for modern algorithms is a bunch of examples with reliable labels: spam or not spam, customer called back or did not, user gave a 5 star rating or not. As it happens, Facebook has plenty of articles that have been shared, but probably a much less good idea of which ones were satirical and which weren’t. For Facebook, the obvious pieces of information about an article are its URL, which tells you where it came from, its text, which may be stored somewhere, and the text of likes and shares, which is potentially useful but isn’t as good as a real label. Probably the optimal thing to do here is to regard some list of sources (the Onion, the Daily Show, etc.) as generating known-satire and hoping to generalize from that base.


Join the Partisia Blockchain Hackathon, design the future, gain new skills, and win!


With the answers in hand, we can now outline a solution. The most basic approach is to simply use that list of sources of known-satire as the only things worth labeling — this seems to be the current solution — but we can probably do better. The next step might be to take every article from that gold standard list, treat every word in them as a potential signal, then compute the features that distinguish satire from non-satire and ‘score’ future articles on whether or not they include these features. These features will likely be simple things like stilted vocabulary combined with cursing. Not very complex, but the technology here is completely off-the-shelf. Could it really do the job of detecting satire? Kind of! It’s likely that such a classifier would be pretty good at detecting fake news, but might have a harder time with articles like the one above; without a ton of human context and common sense, we couldn’t generalize much, but we could probably get enough confidence that at least some satirical articles from as-yet-unidentified sources should get flagged. Whether this is worth enough to implement will depend on the exact values of our tolerances, which of course we can’t know without actually trying the project, but you would likely get something you could actually put into production to improve your user experience.

It’s weirdly easy for executives not to explore the ways that data science can improve the success of their organizations, but it shouldn’t be hard to get the confidence to ask. If you know the number you care about, how accurate you need the result to be, and you have lots of well-labeled examples, you can almost certainly get real business value out of a big data project, even if what you want is impossible.

Follow @DataconomyMedia


The Four Questions You Need To Ask To Get the Most Out of Your DataDennis Clark studied algebraic geometry and theoretical computer science at Harvard University. After a few years’ service with distinction in the financial sector at QVT Financial LP, he’s brought his business savvy and linear algebra skills to Luminoso. Dennis primarily handles customer relations, product management, and strategic planning, but also provides insight into the mathematical computations and strategies of Luminoso product development.


(Image Credit: Thomas Angermann)

Related Posts

What is ChatGPT Plus, and how to get it? Learn its features, price, and how to join ChatGPT Plus waitlist. Is it worth it? Keep reading and find out

ChatGPT Plus: How does the paid version work?

February 2, 2023
AI Text Classifier: OpenAI's ChatGPT detector can distinguishes AI-generated text

AI Text Classifier: OpenAI’s ChatGPT detector indicates AI-generated text

February 2, 2023
BuzzFeed ChatGPT integration: Buzzfeed stock surges in enthusiasm over OpenAI

BuzzFeed ChatGPT integration: Buzzfeed stock surges after the OpenAI deal

February 2, 2023
Adversarial machine learning 101: A new frontier in cybersecurity

Adversarial machine learning 101: A new cybersecurity frontier

January 31, 2023
What is the Nvidia Eye Contact AI feature? Learn how to get and use the new Nvidia Broadcast feature. Zoom meetings and streams get easier.

Nvidia Eye Contact AI can be the savior of your online meetings

February 2, 2023
How did ChatGPT passed an MBA exam

How did ChatGPT passed an MBA exam?

February 2, 2023

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

LATEST ARTICLES

Cyberpsychology: The psychological underpinnings of cybersecurity risks

ChatGPT Plus: How does the paid version work?

AI Text Classifier: OpenAI’s ChatGPT detector indicates AI-generated text

A journey worth taking: Shifting from BPM to DPA

BuzzFeed ChatGPT integration: Buzzfeed stock surges after the OpenAI deal

Adversarial machine learning 101: A new cybersecurity frontier

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy
  • Partnership
  • Writers wanted

Follow Us

  • News
  • AI
  • Big Data
  • Machine Learning
  • Trends
    • Blockchain
    • Cybersecurity
    • FinTech
    • Gaming
    • Internet of Things
    • Startups
    • Whitepapers
  • Industry
    • Energy & Environment
    • Finance
    • Healthcare
    • Industrial Goods & Services
    • Marketing & Sales
    • Retail & Consumer
    • Technology & IT
    • Transportation & Logistics
  • Events
  • About
    • About Us
    • Contact
    • Imprint
    • Legal & Privacy
    • Newsletter
    • Partner With Us
    • Writers wanted
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.