Beginner’s Guide to the History of Data Science
“Big data” and “data science” may be some of the bigger buzzwords this decade, but they aren’t necessarily new concepts. The idea of data science spans many different fields, and has been slowly making its way into the mainstream for over fifty years. In fact, many considered last year the fiftieth anniversary of its official introduction. While many proponents have taken up the stick, made new assertions and challenges, there are a few names and dates you need know.
1962. John Tukey writes “The Future of Data Analysis.” Published in The Annals of Mathematical Statistics, a major venue for statistical research, he brought the relationship between statistics and analysis into question. One famous quote has since struck a chord with modern data lovers:
“For a long time I have thought I was a statistician, interested in inferences from the particular to the general. But as I have watched mathematical statistics evolve, I have had cause to wonder and to doubt…I have come to feel that my central interest is in data analysis, which I take to include, among other things: procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data.”
1974. After Tukey, there is another important name that any data enthusiast should know: Peter Naur. He published the Concise Survey of Computer Methods, which surveyed data processing methods across a wide variety of applications. More importantly, the very term “data science” is used repeatedly. Naur offers his own definition of the term: “The science of dealing with data, once they have been established, while the relation of the data to what they represent is delegated to other fields and sciences.” It would take some time for the ideas to really catch on, but the general push toward data science started to pop up more and more often after his paper.
1977. The International Association for Statistical Computing (IASC) was founded. Their mission was to “link traditional statistical methodology, modern computer technology, and the knowledge of domain experts in order to convert data into information and knowledge.” In this year, Tukey also published a second major work: “Exploratory Data Analysis.” Here, he argues that emphasis should be placed on using data to suggest hypotheses for testing, and that exploratory data analysis should work side-by-side with confirmatory data analysis. In 1989, the first Knowledge Discovery in Databases (KDD) workshop was organized, which would become the annual ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD).
In 1994 the early forms of modern marketing began to appear. One example comes from the Business Week cover story “Database Marketing.” Here, readers get the news that companies are gathering all kinds of data in order to start new marketing campaigns. While companies had yet to figure out what to do with all of the data, the ominous line that “still, many companies believe they have no choice but to brave the database-marketing frontier” marked the beginning of an era.
In 1996, the term “data science” appeared for the first time at the International Federation of Classification Societies in Japan. The topic? “Data science, classification, and related methods.” The next year, in 1997, C.F. Jeff Wu gave an inaugural lecture titled simply “Statistics = Data Science?”
Already in 1999, we get a glimpse of the burgeoning field of big data. Jacob Zahavi, quoted in “Mining Data for Nuggets of Knowledge” in Knowledge@Wharton had some more insight that would only prove to true over the following years:
“Conventional statistical methods work well with small data sets. Today’s databases, however, can involve millions of rows and scores of columns of data… Scalability is a huge issue in data mining. Another technical challenge is developing models that can do a better job analyzing data, detecting non-linear relationships and interaction between elements… Special data mining tools may have to be developed to address web-site decisions.”
And this was only in 1999! 2001 brought even more, including the first usage of “software as a service,” the fundamental concept behind cloud-based applications. Data science and big data seemed to grow and work perfectly with the developing technology. One of the many more important names is William S. Cleveland. He co-edited Tukey’s collected works, developed valuable statistical methods, and published the paper “Data Science: An Action Plan for Expanding the Technical Areas of the field of Statistics.” Cleveland put forward the notion that data science was an independent discipline and named six areas in which he believed data scientists should be educated: multidisciplinary investigations, models and methods for data, computing with data, pedagogy, tool evaluation, and theory.
2008. The term “data scientist” is often attributed to Jeff Hammerbacher and DJ Patil, of Facebook and LinkedIn—because they carefully chose it. Attempting to describe their teams and work, they settled on “data scientist” and a buzzword was born. (Oh, and Patil continues to make waves as the current Chief Data Scientist at White House Office of Science and Technology Policy).
2010. The term “data science” has fully infiltrated the vernacular. Between just 2011 and 2012, “data scientist” job listings increased 15,000%. There has also been an increase in conferences and meetups devoted solely to data science and big data. The theme of data science hasn’t only become popular by this point, it has become highly developed and incredibly useful.
2013 was the year data got really big. IBM shared statistics that showed 90% of the world’s data had been created in the preceding two years, alone.
2016 may have only just began, but predictions are already begin made for the upcoming year. Data science is entrenched in machine learning, and many expect this to be the year of Deep Learning. With access to vast amounts of data, deep learning will be key towards moving forward into new areas. This will go hand-in-hand with opening up data and creating open source data solutions that enable non-experts to take part in the data science revolution.
In the past decade, the idea of data science exploded and slowly became what we recognize today. One vital point analysts understand is that data science and big data are not simply “scaling up” data. Instead, it means a shift in study and analysis. Despite seeming almost completely ordinary in today’s world, like something that could not possibly be removed from research and study, the nature and importance of data science was not always so clear, and its exact nature will continue to develop alongside technology.
image credit: Jer Thorp
Like this article? Subscribe to our weekly newsletter to never miss out!