Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

Democratizing Data Assets: Learning From Data, Big and Small

byKirk Borne
May 15, 2014
in Articles, Artificial Intelligence
Home Resources Articles
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail
Google Preferred Source

When we devote so much time and energy talking about Big Data, are we neglecting the important things that you can do with Small Data?

Maybe, but… probably not.

Looking beyond the Big Data hype helps us to capture real value from advanced analytics on data, big and small.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

The drumbeat of Big Data dialogue in social media, in the press, and everywhere merely highlights the important roles that data and analytics are now playing in all sectors. While we read, think, and dream about Big Data, we also realize that the “big” in Big Data refers to more than just data volume. We know all about Big Data velocity, variety, value, and more. For example: independent of data volume, you can have lots of variety in your data, you can have very tight real-time (data velocity) constraints, and you can derive huge value from your data assets. So, I believe that the deluge of Big Data discussions are not actually diverting our attention from small data, but they are indeed democratizing data assets — causing us to give more attention to how we can learn from data, both big and small.

So, how do we “learn from data”?

In the field of Machine Learning, algorithms are usually categorized as Supervised, Semi-Supervised, or Unsupervised. The first and second of these usually require the use of historical training data to build and improve classification and predictive models — it is fair to say that (in most cases) the bigger the training data set, then the better (more complete, accurate, robust) will be our predictive analytics models.

The third category of Machine Learning (Unsupervised Learning) is essentially the purest form of Data Mining (in my opinion): it is data-driven, evidence-based, unfettered by models or preconceived notions regarding the patterns in the data. It is used to discover the patterns, anomalies, categories, correlations, and features in the data, both BIG and SMALL. This is true knowledge discovery from data. One article (Shabalin et al. 2009) described it this way: “unsupervised exploratory analysis plays an important role in the study of large, high-dimensional datasets that arise in a variety of applications”.  When performed with rigorous systematic scientific methodology (as opposed to random “fishing expeditions”), the data mining application of Unsupervised Machine Learning algorithms becomes “powerful Jedi” Data Science.

“Discovery from Data” and “Learning from Data” certainly become more effective when the data set is big, but most types of unsupervised learning work just fine with relatively small data. There are four broad categories of unsupervised learning that you can apply to your data (big and small). These are:

  1. Novelty Discovery (also known as outlier or anomaly detection, but I prefer to call this “Surprise Discovery” — finding the rare, unexpected, surprising thing in your data).
  2. Correlation Discovery (finding the patterns, relationships, and correlations in data — e.g., using principal component analysis, or independent component analysis, or the maximal information coefficient).
  3. Class Discovery (finding new classes of events or behaviors, including finding new subclasses of previously known classes, or discovering improved rules for distinguishing and disambiguating known classes and subclasses).
  4. Association Discovery (discovering connections between different things, people, or events; or finding unusual, improbable co-occurring combinations of attributes, things, or objects — this type of algorithm is usually at the heart of Recommender Engines — it is sometimes called Link Mining, or Market Basket Analysis, or Graph Mining — several fun examples of this are discussed in the final 7 minutes of the TedX talk “Big Data, Small World”).

Yes, of course, knowledge discovery from data (i.e., Learning From Data) is FUN, especially if it is unsupervised!


kirkKirk is a data scientist, top big data influencer and professor of astrophysics and computational science at George Mason University.He spent nearly 20 years supporting NASA projects, including NASA’s Hubble Space Telescope as Data Archive Project Scientist, NASA’s Astronomy Data Center, and NASA’s Space Science Data Operations Office. He has extensive experience in large scientific databases and information systems, including expertise in scientific data mining. He is currently working on the design and development of the proposed Large Synoptic Survey Telescope (LSST), for which he is contributing in the areas of science data management, informatics and statistical science research, galaxies research, and education and public outreach.


Interested in more content like this? Sign up to our newsletter, and you wont miss a thing!

[mc4wp_form]

 

Tags: Big Data InfluencersKirk BorneMachine LearningSmall DataWeekly Newsletter

Related Posts

Getty Images partners with OpenAI to supply licensed visuals for ChatGPT

Getty Images partners with OpenAI to supply licensed visuals for ChatGPT

June 23, 2026
Samsung adopts ChatGPT Enterprise and Codex across global workforce

Samsung adopts ChatGPT Enterprise and Codex across global workforce

June 22, 2026
OpenAI improves health responses for free ChatGPT users

OpenAI improves health responses for free ChatGPT users

June 19, 2026
What 53,000 Churches Reveal About the Digital Transformation of Faith Communities

What 53,000 Churches Reveal About the Digital Transformation of Faith Communities

June 19, 2026
Xenco Medical wins back-to-back honors with Fast Company’s 2026 World Changing Ideas Award and Time Magazine 2026 Impact Award

Xenco Medical wins back-to-back honors with Fast Company’s 2026 World Changing Ideas Award and Time Magazine 2026 Impact Award

June 17, 2026
Steam Next Fest sees one in five demos labeled for generative AI

Steam Next Fest sees one in five demos labeled for generative AI

June 17, 2026
Please login to join discussion

LATEST NEWS

PlayStation 6 leak points to 2027 release window

Samsung unveils UFS 5.0 storage for future Galaxy phones

Getty Images partners with OpenAI to supply licensed visuals for ChatGPT

Instagram for TV launches on Samsung TVs in the US

Valve opens Steam Machine reservations starting at $1,049

Apple releases iOS 27 beta 2 with new “Write with Siri” feature

BEST AI MODELS LEADERBOARD

See the best AI models, ranked by intelligence, benchmark results, speed and token price. Find the most suitable LLMs, Text-to-Image, Image Editing, Text-to-Speech, Text-to-Video and Image-to-Video  artificial intelligence model for your tasks and business.

LATEST TOOLS

Moonbeam

Charisma AI

Essay Writer by Papertyper

Slite

Wonderin AI

Spur

Stenography

Calldesk

MaxAI.me

PhotoRestore

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies to improve your experience. You can choose to accept or reject them. Visit our Privacy Policy.