I recently sat down with Professor Viktor Mayer-Schönberger, co-author of the 2013 bestseller Big Data: A Revolution and Chair of Internet Governance and Regulation at the Oxford Internet Institute, to talk about the meaning, implications and possibilities of ‘Big Data’ today. In our exchange, we touched on the debate between causality and correlation, the relationship between data and faith, and the idea of ‘digital forgetting’, among other things that reflect the role of data in our digital age.
It’s a pleasure to interview you Professor Mayer-Schönberger. First off, let’s address the concept of ‘Big Data’. What’s your take on it?
I start by looking at the basic nature of humanity: we are by nature empirical creatures, and since time immemorial we have always wanted to make sense of the world around us. In the past, we did this by observing phenomena and deducing patterns from our surroundings. And we did this by collecting and analysing data – sure, it was small data, but it was data nevertheless. Gradually, however, with more data came a rise in money and time cost, so humans had to start finding ways to mitigate these limitations.
In a small data age, we designed various structures and institutions to explain everyday occurrences. Take the idea of randomised sampling, for instance: it helped us tremendously in terms of quality control, public opinion polling etc., but as a short-cut it comes with limitations, which I usually explain with an analogy to photography: Say if I were to take a picture of you right now, I would have to first decide what to focus on: you, or the books stacked behind you? With comprehensive data, however, we can now look at the data first, and then do the analysis which will then inspire us to ask more novel questions – ones that we never even thought about before starting our analysis.
Hence, in the big data paradigm, we not only can provide answers to questions we already have in mind, but more significantly, we can use the increase in data to come up with new questions , and this greatly stimulates human discovery. Also, we can use the newly collected data to test, challenge, prove or disprove hypothesis without repeating the tedious process of individual sampling.
This all sounds incredibly optimistic, but in an age where data leakage and digital hacking happen on a regular basis, a lot of people remain quite wary about the ‘bigness’ and intrusiveness of data. What do you make of this apprehension?
To me, the point about Big Data is to offer a more comprehensive and detailed view of what reality is, which I don’t think is a bad thing in itself. This pursuit underlies the whole Enlightenment, down to the Scientific Revolution and onwards in 20th century history. Our task should be attempting to grasp, and not reduce, reality. Take Newton’s law of gravity, for instance: when Einstein came along we were told that relativity and time space complicate old stable notions of how gravity works. And so intellectual advancement can only happen with the constant challenging of old mindsets, and we find that reality in general is really more all-encompassing and nuanced than any simplified rules would have us believe. Essentially, the project of Big Data is just that: to understand the reality around us in its detailed entirety. I believe that this is definitely better than settling for ignorance.
It is also important to remember that most data is non-personal. Your Facebook posts aren’t personal, the machine data of my car engine isn’t personal, but the bottom line is that all data is incredibly valuable. I’m not interested in collecting data to find out any information about you as a specific individual, but in evaluating human behaviour in its multiple differentiations. In principle, it’s more about being inquisitive about humanity as a whole, rather than being intrusive towards any singular individuals.
One key binary that I picked up from your bestseller Big Data: A Revolution (2013) is the distinction between ‘causality’ and ‘correlation’. Specifically, you say with the rise of big data “princely causation must share the podium with humble correlation”, can you elaborate on that?
We must bear in mind that data is always about what we can and want to do with it. I am not arguing that causality is no longer important, but that correlation – understanding how things relate rather than explain each other – is the new kid on the block with something insightful to offer. As a first step, it allows us to first clarify what is going on in the picture before making any ‘casual links’. The quest for causality isn’t dead, but we must first be humble about our limitations in knowing. Too often, we have taken spurious correlations to be products of so-called ‘causality’. As a filtering mechanism which allows us to better investigate causality, correlation asks if a lot of things don’t just happen by pure chance, and that perhaps certain ‘master narratives’ about war or brutality, of capitalism winning over authoritarianism etc don’t exist at all. Basically, there is a lot of humility built into the big data approach, since it continually pushes the boundaries of human knowledge and never settles for a totalising ‘closure’ or conclusion.
What do you think this attitude of ‘humble scepticism’ imply for religion and faith? Take the recent wave of Islamophobia in Dresden, for example, a lot of anti-Islamist protesters have been spreading fear about rapid Muslim migration, when a look at the statistics shows that the Muslim population there in fact remains an insignificant minority. Do you think that Big Data can be clarifying in such cases?
First, it is important to note that Big Data does leave room for the existence of faith. It only offers a close approximation of reality, and not a full reflection of it. Hence, its ‘incompleteness’ permits faith to do its share of explanation for believers. At the same time, however, it can help debunk claims of religious bigotry by precisely showing the lack of correlation between certain views and the reality. In the case of Islamophobia, big data would expose the scaremongering nature of anti-Muslim rhetoric by pointing out that of 1.6 million Muslims worldwide, only about 16 thousand are ISIS followers, which is hardly a significant proportion.
But then if some people choose to wilfully ignore such evidence, doesn’t Big Data then fail as a solution?
Well that’s the goal, really: Big Data cannot claim to be a cure-all, but what it can do is to help rationalise human debate. By encouraging a more empirical and factual approach to argument, it exposes the problems of cherry-picking facts from different sources and then misrepresenting them as ‘causal’ links. So now if we want to come up with a legitimate claim, we have to actually do the big data analysis to corroborate it. This then helps us challenge preconceptions and prejudices.
In your book, you say that while today “the ideas and skills seem to hold the greatest worth, eventually most value will be in the data itself”. By extension of this, surely whoever gets to own and analyse such data will also hold great ‘value’. What do you think are the implications of Big Data for political and corporate power?
It is ironic that nowadays, many companies that hold lots of data actually have no clue what to do with the information. Data alone isn’t going to help much if you don’t know how to use it. Just think of the Stasi for example. But yes, data is definitely a fundamental element of power, a source from which more power will be derived in the future. It seems to me that the open access to scientific results is really yesteryear’s debate, however, and that today our focus should be on who has open access to scientific data. Rather than placing an interest on the journals and research papers written with the help of existing data, why not shift our sights on releasing the raw data to the public? Say if a company gets a research grant or a tax break from the state, then it would have to release it to society at large. And this concern with data justice is exactly the kind of debate we should be having in a new big data age.
This topic of fair access to information relates to your 2009 work Delete: The Virtue of Forgetting in the Digital Age, in which you argue for the importance of ‘forgetting’ digital data. Isn’t it quite a counter-intuitive claim to make, though, that somehow humans who are already liable to forgetting things would need technology to help them forget?
In the book, I argue for the enabling of data eradication. It is the choice that makes all the difference, and it is important that we humans have the option to discard irrelevant memories. If we undo this ‘forgetting’ by means of digital preservation, we will very likely be constrained in our ability to decide and act in the present with a forward-looking mindset, simply because the past and its overwhelming irrelevancies tether us in our decision-making process.
So who gets to decide what should expire when?
This is a very important question, and in Delete, I argue that we should devise digital tools, rather than designate any human agents to enable this forgetting. The problem isn’t so much of someone yielding power over me, but of my own past imposing this shapeless, but not less influential, clout on the way I think. Digital incentives can help reduce this domination, and people have so far responded well to this idea – just look at Snapchat. Because one of its main features encourages ephemerality, people use it because they like to be able to share something which can go away in a flash and remain traceless. It’s less important that you possess a certain piece of information, but that this information isn’t residing somewhere in the mass digital database where people can gain access to it.
It sounds like this has obvious benefits for rehabilitative justice. Can Big Data help give others a second-chance at life then?
Absolutely. Take juvenile delinquents for example. Learning is about experimentation, and experimentation is in a way about breaking rules. Leaving a teenager’s criminal record for petty theft in the Google database for easy public access, however, is a sign of an unforgetful society. And indeed, this unforgetfulness then becomes a symptom of an unforgiving society. The option to digitally ‘forget’, then, becomes crucial in changing this dynamic: for example, I know that there are two Guardian journalists who will respond to requests for self-anonymisation, meaning that they will take a person’s name off an article they’ve written if asked to. This isn’t to say that his or her record will vanish in the archive – it still remains in its physical form at the Guardian headquarters. But what it does provide is a sort of ‘speed bump’ that does not completely erase history but instead helps people to get another shot at life.
What do you have to say to those who are still unsure about ‘Big Data’?
To them I say: The only way that you can be in power and not be ‘controlled’ by data is to first understand its power and then use it to your own advantage. Otherwise, you will always be at the receiving end in the balance of power between the ‘big guys’ who analyse data and the ‘every man’ who supplies data – whether consciously or inadvertently.
(Image credit: Amazon)