As a research scientist at the German online retail giant Zalando, Dr. Alan Akbik is an expert in Natural Language Processing and Data Extraction. In his work for the company, which at any given moment is handling massive numbers of online transactions in multiple languages, Akbik helps unveil unique insights into the very structure of human language by observing and analyzing huge sets of multilingual text data. Here’s what he had to say about the possibilities for both business and the study of language that NLP is bringing online.
What first inspired you to pursue a career as a data scientist?
My love for human languages! In a sense, all of humankind’s knowledge is stored in written language in books, the Web, and elsewhere. Our hope is that data science – and in particular natural language processing (NLP) – can help computers and us make sense of all this textual data.
What’s a particular pattern or insight that you have uncovered while at Zalando that you think could also help companies working outside of the retail sector?
Leverage your textual data! I think many companies might be surprised by how much textual data they have available, and how much value they can get out of it.
Zalando Official Logo
At the moment, what new technology has you the most excited by its capabilities?
We are currently working with recurrent neural networks (RNNs) of all flavors that have me very excited for their language modeling and sequence labeling capabilities. I believe these techniques may – in the near future – lead to important breakthroughs in modeling and automatically capturing semantics in human language.
What tools are you most heavily relying on in your day to day work? How do you make sense of multilingual data?
We are researching a technique called “annotation projection” that can automatically transfer NLP methods that work for one language (such as English) to another (such as German). This helps us immediately scale our NLP across the many European languages relevant for us and our customers. We even released an open source framework for this technique, called ZAP. Do try it out!
How has your time spent getting familiar with Information Extraction made you a more effective data scientist? What skill or field of knowledge would you like to augment it with?
Information Extraction (IE) is a core task of extracting structured information from text data and therefore hugely important for data science that involves such data. I am interested in databases, machine learning and computational linguistics, because they are important fields of knowledge for IE.
What is something you know about customer behavior or the way we use language that you didn’t know before working at Zalando? Has anything surprised you?
I am (continuously) surprised by the many particularities of informal language usage on the Web, especially in the domain of fashion where new words (for trends, looks etc.) are invented seemingly every day. It shows well the creativity and enthusiasm of the fashion community and presents us with interesting research challenges for NLP and data science.
To learn more about Data Science and Zalando, get your Data Natives ticket here.
Like this article? Subscribe to our weekly newsletter to never miss out!