Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

The Problem With (Statistical) False Friends

byIan White
March 10, 2017
in Articles
Home Resources Articles
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail

I recently stumbled across a research paper, Using Deep Learning and Google Street View to Estimate the Demographic Makeup of the US, which piqued my interest in derivative uses of data, an ongoing research interest of mine. A variety of deep learning techniques were used to draw conclusions about relationships of car ownership, political affiliation and demographics. For those headline skimmers, you may be led to believe that researchers have just uncovered a vastly cheaper and more timely approach to perform the national census and make predictive claims about the population.

The researchers’ contention that official statistics are expensive and lagging is spot on. The principal US unemployment survey is performed in person or via telephone. Mystery shoppers still go into the field to purchase the underlying goods in the Consumer Price Index. Monthly government statistics are typically released several weeks after the close of the period and revised multiple times. The more infrequent the release, the longer the tabulation period. And for good reason.

These are national statistics, and by government mandate are required to have a transparent, consistent and well-understood methodology. When countries lie, they get found out. Ask Argentina about bogus inflation statistics. And that wasn’t even the dumb part–the difference between provincial government and national stats (black line) during the time in question is obvious to anybody who can read a chart:

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

The Problem With (Statistical) False Friends

Or analyze online prices in Argentina, compute a price index and see a similar conclusion. This initiative turned into the Billion Prices Project at MIT and is one of the innumerable research projects that use novel/alternative approaches to measure macro trends in a timely manner. Other highlights include Google’s use of flu-related search terms to indicate current influenza rates (which worked until it didn’t). Or near-time reporting of unemployment rates across EU member states. But I digress…

The danger in relying on the Google Street View study cited above can lead to spurious claims when taken out of context. I’m sure the authors are rolling their eyes at the below because nobody is suggesting polling can be better performed by knowing automobile ownership (not to mention the bias).

For example, the vehicular feature that was most strongly associated with Democratic precincts was sedans, whereas Republican precincts were most strongly associated with extended-cab pickup trucks (a truck with rear-seat access). We found that by driving through a city for 15 minutes while counting sedans and pickup trucks, it is possible to reliably determine whether the city voted Democratic or Republican: if there are more sedans, it probably voted Democrat (88% chance) and if there are more pickup trucks, it probably voted Republican (82% chance).

Also, while interesting, commercial market research vendors, such as Experian Automotive, can tell you much of the same information without the heavy probabilistic approach. Other research approaches also exist. It is clear there is more than one way to skin a cat, but it’s difficult to know which method will yield desired results (this analogy is still under development).

Kudos to the research team in the technical domain, but in the context of survey design and generally synthesizing a body of research, they really missed the boat. With the flood of non-traditional data sources available it is easier than ever to make inferences that lead to cognitive and statistical over-fitting. Chris Anderson’s WIRED essay on the topic from nearly a decade ago was prescient and should be required reading.

Key findings from studies that rely on highly dimensional data can be used as hypotheses to further interrogate research where there are questions about data paucity or legitimacy. This is evident in the case of the Argentinian inflation rate and there are countless examples through the global supply chain, human migration patterns and consumer preferences. Research into big data/novel analytics could be advanced by considering the impact of these proxy indicators for the domain(s) in question. This would compel researchers to be more robust in research design and foster cross-disciplinary thinking.

 

Like this article? Subscribe to our weekly newsletter to never miss out!

Follow @DataconomyMedia

Tags: data sciencestatisticssurveillance

Related Posts

How to estimate your personal loan payments with simple steps

How to estimate your personal loan payments with simple steps

December 12, 2025
Tokenization and how is it changing traditional markets

Tokenization and how is it changing traditional markets

December 12, 2025
“The LLM productivity cliff”: New research offering a different lens on AI productivity

“The LLM productivity cliff”: New research offering a different lens on AI productivity

December 11, 2025
Xenco Medical Wins 2025 World Economic Forum Award for Excellence in Governance and Leadership for Global Challenges

Xenco Medical Wins 2025 World Economic Forum Award for Excellence in Governance and Leadership for Global Challenges

December 4, 2025
How Magicrypto Helps U.S. Investors Earn Stable and Safe Passive Crypto Income

How Magicrypto Helps U.S. Investors Earn Stable and Safe Passive Crypto Income

November 13, 2025
Wysh Puts Free Life Insurance on Stablecoin Accounts

Wysh Puts Free Life Insurance on Stablecoin Accounts

November 6, 2025
Please login to join discussion

LATEST NEWS

The Game Awards 2025: Clair Obscur sweeps Oscars of gaming amid massive announcements

Trump signs executive order limiting state AI laws

Meet the world’s smallest AI supercomputer that fits in your pocket

Samsung is building a global shutter-level sensor for the Galaxy S26

Google now lets you try on clothes virtually with just a selfie

Fortnite returns to Google Play Store after 5-year antitrust battle

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.