Berlin Buzzwords, ‘Germany’s most exciting conference on storing, processing and searching large amounts of digital data’, is back for a sixth year! The conference will take place on May 31st to June 3rd, and this year it will be held at Postbahnhof. It will feature a range of presentations on large scale computing projects, ranging from beginner-friendly talks to in-depth technical presentations about various technologies.
Here is our pick of the Buzzword’s talks this year:
Teaming up heterogeneous sports data – Building a 360º view of the FIFA World Cup
Jochen works as Principal Technologist at MarkLogic GmbH. His main areas of interest have been Software Architecture, Software Development as well as Data Modeling and Data Processing with NoSQL technologies. Jochen helps organizations to create ideas, concepts and to implement solutions and applications based on heterogeneous and disparate data. In his free time he enjoys his family, sports and music.
Hive on Spark
Szehon Ho is a software engineer in Cloudera and an Apache Hive PMC member, based in Palo Alto. Prior to this, he was a principal software engineer in Informatica, gaining experience in enterprise software development and the field of data integration. He holds a BS in EECS from UC Berkeley.
Compression in Lucene
Ryan Ernst is an Apache Lucene/Solr committer and PMC member. He is an Elasticsearch developer and enjoys working on anything with bits. Prior to Elasticsearch, he worked on Amazon’s Product Search and AWS CloudSearch.
Recommendation at Scale
Simon Dollé joined Criteo in early 2015 as a Senior Software Engineer. He is working on the recommendation system that chooses which products to display on the ad banners. Previously he was Technical Lead at LTU Technologies, developing a large scale image fingerprinting system. He also worked at Botify where he designed data-mining algorithms to identify patterns in website structures. He holds M.Sc. in Computer Science from the Ecole Centrale Paris(France) and a M.Sc with a major in intelligent systems from Technische Universität Dresden (Germany).
Signatures, patterns and trends: Timeseries data mining at Etsy
Andrew joined Etsy in 2014, and lives in London, making him their first data scientist outside the USA. He has been involved in the redesign of their Kale platform for anomaly detection and pattern matching since he came on board. Prior to Etsy he spent almost 15 years designing machine learning workflows, and building search and analytics services, in academia, startups and enterprises, and in an ever-growing list of research areas including biomedical informatics, computational linguistics, social analytics, and educational gaming. These days he’s interested in probabilistic algorithms and data structures, online learning, deep learning, data visualization, and the convergence of search and recommender systems. He can count to over 1000 on his fingers but doesn’t know how to drive a car.
A complete Tweet index on Apache Lucene
Michael Busch is architect in Twitter’s Search & Content organization. He designed and implemented Twitter’s current search index, which is based on Apache Lucene and optimized for realtime search. Prior to Twitter Michael worked at IBM on search and eDiscovery applications. Michael is Lucene committer and Apache member for many years.