Behind the Scenes at the Berlin Big Data Center
When we think of “governments” and “data”, the associated thoughts that spring to mind aren’t often positive. But we’ve seen government-backed and -funded initiatives around the globe which aim to harness data science for positive growth and change. From fighting fires in Israel to amplifying the voice of the electorate in India, governments with big data isn’t all bad news. Here, in Dataconomy’s home of Berlin, the German ministry of research and economics is funding the Berlin Big Data Center– an institution committed to advancing technology and innovation here and abroad. We recently spoke to Dr. Stefan Edlich, one of the center’s Principal Investigators, about the instituion, and what we can expect from them in the future.
Could you give us a brief introduction to yourself and the Berlin Big Data Center.
In recent years the German ministry of research and economics have put big data on their agenda which resulted in several fundings in 2014. Two centres won the race for funds, one in Leipzig / Dresden called ScaDS and the other one in Berlin. With 3 universities, so many research institutes and a vibrant start-up community this is a perfect environment for cutting edge research in Berlin. For this reason we have many partners in bbdc.berlin working together in research: TU-Berlin, Zuse Institut Berlin, Fritz-Haber Institut, Max-Plank-Gesellschaft, DFKI and Beuth Hochschule.
Talk us through the conception of BBDC.
The bbdc.berlin has technical and non-technical tasks to solve. Let’s start with the non technical aspects: The first is education. Fortunately Germany and the government have come to conclusion that Data Science on Big Data is an important field and is one of the key success factors for the economy. That’s why TU-Berlin now offers a masters in this area, together with international partners. But of course the vision is much bigger. We need better funding, more master programs, a deeper integration with companies and much more. This goes hand in hand with the support of young researchers. Furthermore there are aspects of innovation and sustainability that our research has to bring. Finally, Germany needs successful and visible application examples as lighthouse projects to motivate and foster the industry to create similar projects.
What are the goals for the BBDC?
As researchers, some of our main goals are technical, and here we are focused on a fascinating field: We are trying to bring together scalable data processing with scalable machine learning. The heart of the project as scalable machine learning is still in its early days. There’s many people and the industry doing this with Hadoop or with languages as R for statistical analysis, but this is often not scalable nor in real-time. For this reason we need new systems, new libraries and comprehensive experiences with lots of application areas. We have many research areas around this topic as declarative programming models, debugging of such systems, adaptive big data processing, system integration and much more.
The other interesting part is that this core must be surrounded by application areas. Here we have strong partners doing research in material sciences, video and text mining, and image analysis in medicine. These are really exciting fields.
Do you have some examples of applications?
Let’s talk about two fields: Imagine what would happen if all videos (not only youtube) could be analyzed in a way so that you have the complete metadata of the film. Meaning the complete transcript and the complete action plot. For example at 5:45 in a film the computer can automatically derive that a green clothed man gets into a car and tells his neighbour to send best wishes to his wife! And this for the entire film! This would be a huge step towards real-time knowledge of any video / audio streams with interesting implications.
Another cool area is material sciences: Here you normally have a terabyte of data for just one material, with it’s features and the interactions with other materials. In the earlier days you would have to do many thousands of costly experiments to gain new insights about materials and their interactions in combination. But if you have the next generation big data processing system with a superior performance and strong machine learning capabilities, you are suddenly able to predict material features which can save you a lot of money and puts you ahead in global competition.
What system are you using?
Thanks to generous funding by a lot of institutions such as DFG, many universities as TU, HU, HIP and others have started to build a superior big data processing system called Stratosphere. During the lifetime of the bbdc.berlin project (that runs till 2018) we will leverage this to produce the strongest system in this area. Several successful steps have already been made since the start of bbdc.berlin: The project advanced to an Apache.org Top-Level project in record time and is now called Apache Flink. Anyone can now download it and use it within ten minutes!
What’s in store for the BBDC in 2015?
There is too much to enumerate! Some highlights: We will improve education activities and connect with the industry and many more research project up to the EU level. And I am sure the first practical results will be available this year. Another opportunity for everyone are the events we will be launching this year. One of the most visible one will be a conference around Apache Flink in mid-October where we will attract many from the industry, research and everyone interested in big data processing.