Data Hoarding and Alternative Data In Finance &#8211; How to Overcome the Challenges

Financial institutions have become data hoarders

Banks, hedge funds, and asset managers have become data hoarders. However, many of these firms find it difficult to make use of all of this data. They need tools that can be used to extract information from various internal unstructured content and to democratise its use: legal documents, emails, instant messages, news archives, analyst reports etc.

An increasing number of firms are now embracing the cloud making it easier for vendors to come in and analyse proprietary content on their behalf. This new trend is primarily driven by the more sophisticated hedge funds and assets managers, since banks are often more restricted by their compliance.

But it is challenging to make use of that data. Big data craze inspires firms to save every possible bit of data, with the misconception that the more data you have, the better. Firms must keep data (for compliance purposes) or often aren’t sure what information they need to keep. Having more data is not necessarily a good thing when you are not sure how it is going to accumulate or how to manage the data. There is hope that data hoarding, however, will eventually bear fruits when it comes to alpha generation – with the right help that is!

Catching the Alternative Data Wave

Much of the data hoarding actually comes from alternative data sources. The proliferation of social networks, mobile devices, IoT, low-cost sensors, and image-processing have led to an explosion of new and potential data sources. It is creating some interesting opportunities and new ways of harvesting signals for investors. A lot of this information is new. Financial institutions have been used to building models based on market and fundamental data. Alternative sources now offer a new way of getting insights into fundamentals – often on a real-time basis.

But, what are these alternative data sources?

News & Social Media – traditional news, microblogs, or unstructured data firehoses to understand what’s happening in the world. The most mature of these alternative datasets. It’s been around for awhile. Machine readable news and social media has already made its way into the quantitative process as a proven source of alpha

Credit Card Transactions – anonymous aggregate transaction data to capture trends in consumer purchasing habits that can offer a daily reading on (expected) company revenues

Satellite Data – image data from orbiting satellites to do things like measure farm health based on the color of crops, how many people are purchasing at Wal Mart or other retail stores as a result as counting the number of cars in a parking lot

Internet of Things (IoT) – collected data from smart grids, smart cities, and shipping/transportation systems to measure in real-time supply and demand of resources or services

Crowdsourced data – opinions from large groups of people especially from online communities/specialized social networks offering insights from the “wisdom of the crowd”

Location/Foot Traffic Data – where consumers shop by measuring foot traffic via check-ins, mobile phone traffic, video analysis, etc.

Local Prices – what’s happening to prices and inflation by aggregating data from measurements by people on the ground, specifically useful in remote areas where it’s more difficult to get data for crops or prices of specific services

Peer lending data – lending/borrowing transactions for a more timely view of supply of capital or overindebtedness

App Data – data from web/mobile to understand how people are interacting with their devices

Weather Data – information utilizing sensors to measure how weather will influence our daily lives and choices, sensors are even placed inside of buildings to know how it really feels to be at certain places

Alternative Data comes with its challenges

It’s NOT about finding that one Big Data factor that you can simply plug into your model and you’re good to go. There are basically 3 challenges to overcome:

Value: is there value in the data?

Some of these datasets are so new that there is no professional or academic research, we don’t know if they work
A lot of the information is at the product or service level, and not easily mapped to tradeable securities

Relevance: can you use it as part of your investment process?

The data is unstructured, hence requires NLP for text; or images require special processing through AI
The history of these datasets is limited (even if we started to hoard data) so historical archive is not always large enough to make proper backtesting.
We need to wait/accumulate until it’s testable
Content integrity, providers were not contemplating selling it and we need to normalize datasets and put it into a format that is useful

Capacity: does the data have capacity to be used, how much can you actually trade?

Niche data, covering only limited number of stocks (ex: twitter only for stocks that people speak about), or retail / healthcare / tech focused
Value erosion: the more users on these niche datasets, the more likely their basic value will be arbitraged away: need of sophisticated models

But there are also many opportunities with Alternative Data

It gives a way to:

Innovate and develop differentiated portfolios, improve scalability and avoid crowded trades
Explain things that we can’t understand at present with market data and fundamentals which we all have
Measure new and interesting estimates, to create new factors or economic indicators
Connect the dots between different data points by looking at what people are saying about an event, a competitor, a supplier, i.e. contagion effects across an entire network of tradeable securities
And most importantly, predict more accurately than we do today

This post appeared originally here

Like this article? Subscribe to our weekly newsletter to never miss out!

Follow @DataconomyMedia

Tags: Finance fintech surveillance

Data Hoarding and Alternative Data In Finance – How to Overcome the Challenges

Related Posts

ChatGPT Plus users can now connect financial accounts

What 53,000 Churches Reveal About the Digital Transformation of Faith Communities

Xenco Medical wins back-to-back honors with Fast Company’s 2026 World Changing Ideas Award and Time Magazine 2026 Impact Award

Innovation under pressure: How Ukraine’s fintech unicorn grew during war

Data Sovereignty and Document Security: Where Does the Data Actually Live?

Warner Music to acquire Sureel AI to track unauthorized AI music use

LATEST NEWS

Tesla brings long-wheelbase Model Y to the US

Opera adds protection against copy-paste ClickFix attacks

Cloudflare will block AI crawlers unless sites opt in

Meta releases Pocket app for generative AI games

Android Halo will place AI agent updates in status bar

WhatsApp usernames spark impersonation and fraud concerns

BEST AI MODELS LEADERBOARD

LATEST TOOLS

Instantchapters

Intellectia

ZipWP

Copyleaks – Plagiarism detector

Clipping Magic

KoalaChat

SpeechText

Booknotes

Unscrambler

LingoLooper

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.