Data Natives Berlin speaker Allan Hanbury is Senior Researcher and Privatdozent at the TU Wien, Austria. Since 2010, he has coordinated EU-funded research and development projects on analysis and search of medical text and image data. This led to the recent founding of a start-up, ContextFlow, which is bringing the developed radiology image search technology to the market. His team at the TU Wien also works on the analysis and search of unstructured data in technical documents (patents, scientific publications), financial documents (bank and company annual reports), and enterprise social networks. He also leads a project to develop the fundamentals of a Data-Services Ecosystem in Austria.
Go to datanatives.io and get instant access to speaker and ticket discount information on Data Natives Berlin 2016
Tell us a few words about yourself
My name is Allan Hanbury, and I am Senior Researcher at Technische Universität Wien (Vienna University of Technology). I work with the Institute of Software Technology and Interactive Systems, and will be speaking at Data Natives Berlin 2016.
What topic will you be discussing during Data Natives Berlin?
My talk focuses on analyzing and searching unstructured medical data
How did you get involved with medical data analysis?
I started off with a degree in physics, but I discovered that I had a lot more fun analyzing the data than actually setting up and doing the experiments to collect it, so I ended up in computer science. Here I have been concentrating on the analysis of unstructured data, which is the most challenging data to work on.
I got involved in working on analysis of data in the medical area after identifying an unmet technology need for analysis of multilingual medical text and medical images by discussing with people working in medicine and related areas. With this information, I wrote a proposal for an EU-funded project that was granted.
How is data being applied to change the medical field?
The most visible case in the medical field has been the discovery of major drug side effects by mining medical records. If a correlation between a drug and a specific side effect is discovered by mining the information in a huge number of anonymised patient records (i.e. a certain drug and a certain side effect happen to be mentioned together in many medical records), then this provides evidence for an investigation of whether the drug is actually causing the side effect. This has led to at least one drug being taken off the market, as it had heart attacks as a side effect.
There is a huge amount of data that is collected during routine medical care that is simply archived and never analyzed. There is a huge potential to access the implicit information in this data by mining it (in an anonymised form).
What do you hope to gain or learn during Data Natives Berlin?
I hope to meet many interesting people, and learn more about what is currently happening in data analytics in medicine and other areas.
What data-driven technologies are of particular interest to you and why?
I am interested in technologies that allow analysis and search of unstructured data, such as text, and which can find the relations between unstructured and structured data. For example, other work in my team is looking at predicting the evolution of financial indicators, based on the analysis of company and bank annual reports. At present, Word Embedding technologies are proving to be very powerful in text analysis, but they are basically mathematical tools that allow a text representation to be transformed in a way that is potentially useful – there is still plenty of investigation to be done on their limits and capabilities in practice.
Do you believe that Germany is a strategic market for showcasing data-driven technologies? If so, why?
Yes. Germany is a huge market that has a history of being innovative.
How is your field of interest driving the data revolution?
There is plenty of unstructured data stored everywhere, simply because it is easier to create unstructured data than structured data. It is easier to type text into a report, than going through the process of defining and using a data structure. Reading text is also easier for people, but of course it is challenging for automated analysis. The need to analyze the text out there is leading to really interesting technological approaches, but there are still many challenges.
Can you offer advice for other Data Natives wanting to get involved in this particular field?
It’s not necessary to get a first degree in computer science. Any field that involves data analysis could be a starting point (such as physics, mathematics, statistics…) People coming from other fields into Data Science often bring useful skills and unexpected insights into the problems that need to be solved.
Like this article? Subscribe to our weekly newsletter to never miss out!
Image: Kate Ter Haar