Weeks before the release of their paper, “Quantifying the semantics of search behavior before stock market moves”, Dataconomy met with Dr. Suzy Moat and Dr. Tobias Preis to discuss their research on predicting the stock market by analyzing Google and Wikipedia searches. The initial two studies — which asked the question “Is there a relationship between what people are looking for on Google and Wikipedia and subsequent stock market moves?” — was released in April 2013, and received considerable media attention. Now, the two researchers, along with H. Eugene Stanley and Chester Curme, have come out with a follow-up study that seeks to look at their original findings from a different angle — essentially, which particular topics (Politics, Food, Sports) might have a relationship to stock market moves?

The Initial Study: Google Trends, Wikipedia and Anticipating Stock Market Moves

The research from the two professors postulates that a historical analysis of key search terms on Google can predict how stock prices will rise and fall in the coming weeks.

After selecting 98 keywords, ranging from “debt” to “housing,” Dr. Preis and Dr. Moat analyzed whether changes in search volumes for a particular term bore any relationship to changes in stock prices in the subsequent weeks. By managing a theoretical Dow Jones industrial average index-weighted portfolio from 2004 to 2012, they found that changes in searches for terms that were financially related (based on their prevalence on the Financial Times over the eight-year span) — such as, “debt”, “NASDAQ”, “stocks” – displayed a significant relationship to subsequent stock market movements:

“We investigated whether there were more searches for a given term this week, compared to previous weeks. If we found an increase in search volume, we sold the market in the coming week.. And vice versa; if the number of people looking for a certain keyword went down, then we bought the market,” Dr. Preis told Dataconomy.

By employing this “Google Trends Strategy,” Dr. Moat and Dr. Preis saw a 326% investment return based on how often the term “debt” alone was searched — compared to the 6.2% return (after fees) hedge funds generated for their investors in 2012.

Similar results were found by analysing changes in interest in Wikipedia pages, for example pages relating to companies listed in the Dow Jones, or general economic concepts such as “capital”, “wealth”, and “macroeconomics”. Again, the financial relevance of these pages appeared important: data on how often people viewed pages relating to actors and filmmakers was found to be of no value in developing a trading strategy.

The Follow-Up Study: Quantifying Meaning with Wikipedia

To get a better understanding of a search term’s relevance for the stock market, the two researchers worked with Chester Curme, a Research Fellow in the Data Science Lab at Warwick Business School. Their aim was to determine which topics people searched for before stock market moves.

In essence, Curme used an algorithm known as Latent Dirichlet Allocation, or LDA, to parse Wikipedia and identify which words turn up in articles with other words. In so doing, he was able to define the meaning of a word based on its relationship to others and identify which significant “topics,” or groups of words, had a positive correlation with subsequent stock market moves. The researchers were now able to recognise which topics, beyond financial ones, might also perform similarly well with their initial strategy.

The findings showed that terms the algorithm identified as business-related performed well with the researchers’ strategy, as they had already established. What’s interesting, however, was that they also found that terms relating to politics, especially U.S. politics, also had a positive correlation to stock market moves. So it appears that it is not only an increase in searches for financial terms, but also political ones, that had a significant relationship to ensuing stock market activity.

As Dr. Moat describes, “By mining these datasets, we were able to identify a historic link between rises in searches for terms for both business and politics, and a subsequent fall in stock market prices. No other topic was linked to returns that were significantly higher than those generated by randomly buying and selling. The finding that political terms were of use in our trading strategies, as well as more obvious financial terms, provides evidence that valuable information may be contained in search engine data for keywords with less obvious semantic connections to events of interest. Our method provides a new approach for identifying such keywords.”

Implications?

What can we do with such predictions besides outperforming the stock market? Dr. Preis and Dr. Moat hope that their work will help us understand “how we can better forecast patterns in complex human behaviour, and perhaps better control certain aspects of this complexity” “We are not only interested in financial behaviour,” Dr. Preis reminded me throughout the interview.

As such, Dr. Moat and Dr. Preis’s work is more of a behavioral science experiment than a real life strategy for stock market investment, as they are both ready to point out: “We are not suggesting that the strategy we describe in these papers will continue to generate profit in the future.”

Instead, their research is more about using online data as an additional resource – alongside traditional surveys, censuses, opinion polls, etc. – to identify patterns and make predictions about everything from the stock market, to natural disaster prevention and even traffic flow.

“By our everyday use of Google, Twitter, Flickr and other online services, we are generating massive records of human behaviour,” explains Dr. Preis. “Now, more than ever, it is possible to analyse human collective experience with a computer. With these calculations, we can improve our understanding of the probability with which certain events may occur in the future.”


(Image Credit: Jason Devaun)

Previous post

The Good, the Bad, the Onion; Data Security on the Dark Market

Next post

26 November, 2014- Big Data eXchange, London