Making Big Data Meaningful with Graph Technology
Big Data – Bigger Challenge
The volume of net new data being created each year is growing exponentially — a trend that is set to continue for the foreseeable future. The higher the volumes of data get, the more complex data becomes, and the more challenging it gets to generate insights and values from that data. But increased volume isn’t the only force we are facing today: On top of this staggering growth in the volume of data, we are also seeing an increase in both the amount of semi-structure and the degree of connectedness present in that data.
Google, Facebook, Twitter, Adobe and American Express among them have turned to graph technologies to tackle this complexity at the heart of Big Data. Just recently an article by Dr. Roy Martsen outlines how Google started the graph analysis trend in the modern era using links between documents on the Web to understand their semantic context. Google has since then continued to write history and its graph-centric approach has seen the company deliver innovation at scale and dominate not only in its core search market, but also across the information management space.
Graph Technology – Unlocking the Meaning of Big Data
Graphs are a new way of thinking for explicitly modelling the factors that make today’s big data so complex: Semi-structure and connectedness. Putting it in a nutshell: a graph database is an online transactional system that allows you to store, manage and query your data in the form of a graph, i.e. a graph database enables you to represent any kind of data in a highly accessible, elegant way using nodes and relationships, both of which may host properties. The key thing about such a model is that it makes relations first-class citizens of the data, rather than treating them as metadata. As real data points, they can be queried and understood in their variety, weight and quality.
And another very good thing about Graph Technology: its available for everyone, right off the shelf. For example, the Neo4j project is a mature open-source graph database used in production at all kinds of organisations from Global 2000s like Walmart, Lufthansa, and Cisco, to innovative start-ups like FiftyThree, Medium, and CrunchBase. Graph databases like Neo4j have risen to prominence. Just recently Neo Technology saw itself listed as “Cool Vendor” in the Gartner Cool Vendor in DBMS 2014. And as 451 Research analyst Matt Aslett notes: Graphs are moving of the general NOSQL umbrella into a category in their own right. Bearing this in mind it comes as no surprise that Forrester Research estimates that over 25 percent of enterprises will use graph databases by 2017.
Graphs are Eating The World
Graphs don‘t only provide a competitive advantage in domain search. Apart from Twitter and Facebook using the social graph to dominate their markets and Google‘s Knowledge Graph and Facebook‘s Graph Search already all geared up for the next wave of hyper-accurate and hyperpersonal recommendations, graphs are becoming very widely deployed in a host of other industries. One great example here is eBay: owing to the recent acquisition of Shutl, ebay provides a same day delivery service that uses graphs to compute fast, localized door-to-door delivery of goods between buyers and sellers, scaling their business to include the supply chain. Incidentally, eBay observed that before turning to graphs the latency of their longest query was higher than their shortest physical delivery, both around 15 minutes – something that can’t now be replicated when an average query is powered by a graph database and takes 1/50th of a second!
The eBay example is not isolated. Organisations large and small are adopting and winning with graphs in retail, finance, telecoms, IT, gaming, real estate, healthcare, science, and dozens more areas.
The Power of Graph technology
So how do these companies succeed with graphs? Well, just over a year ago, my colleague Max de Marzi undertook a little exercise to show just how easy it to answer difficult questions with a graph. Max built a version of Facebook’s Graph Search that can answer even more questions than the original – over a single weekend using a graph database as his backend! You can take a look at the full story at: http://maxdemarzi.com/2013/01/28/facebook-graph-search-with-cypher-and-neo4j/
To the point, the story shows how far graph technology has matured in recent years, that such powerful graph-based systems can be built over a weekend. Even bringing physical objects into the mix is straightforward: with the burgeoning Internet of Things it is easy to add nodes into the graph that represent physical assets and add spatial indexes (which are themselves graphs) to find their location.
The power of a graph database is exactly like having a mini-web inside your application. You crawl that “web” of nodes via named, directed relationships until you find your goal – and that can be anything. You may want to know where exactly you put your keys, where your long-lost college buddy is working, you may want to find evidence about the efficacy of a clinical trial, or access permissions for computer systems (all graph problems, by the way). The graph database’s role is to store that data safely, and to make querying it fast and easy. Using Neo4j for example, we write a Cypher query that visually describes the graph structure we’re looking for (a pattern) and let the database find matches for that pattern in amongst the network of data it holds.
We at Neo4j see that creating and analysing graphs will bring us to answers, and when we let data connect itself meaning will emerge. We also believe that our ability to understand graphs is greatly enhanced with the right tools, and we’re very excited about where graph technology is heading. You should be too!
Emil Eifrem is CEO of Neo Technology and co-founder of the Neo4j project. Before founding Neo, he was the CTO of Windh AB, where he headed the development of highly complex information architectures for Enterprise Content Management Systems. Committed to sustainable open source, he guides Neo along a balanced path between free availability and commercial reliability. Emil is a frequent conference speaker and author on NOSQL databases.
(Image Credit: stockholminnovation)