PyData at Booking.com: Deep Learning, Statistical Models and NLP

Published in

Booking.com Data Science

5 min readJun 27, 2017

by Melanie JI Mueller & Karlijn Zaanen

In April, Booking.com hosted PyData Amsterdam 2017. The Booking.com headquarters was filled with 330 Python developers and data scientists from all over Europe, who gathered for a weekend full of talks and discussions all about using and evolving Python for Data Science applications. The atmosphere was wonderful, with interesting presentations, people meeting others from the PyData community, sharing experiences, problems and solutions, discussing future developments, and everything in between. As the Dutch would say: gezellig!

We had 32 talks at the conference covering a wide array of PyData-related subjects, from Deep Learning, to Data Visualization, to the Ethics of Machine Learning. Booking.com itself contributed three talks to the conference, which were similarly diverse: applying Deep Learning in production, how to diagnose statistical models, and on using NLP for song lyrics.

Deep Learning

Deep Learning is currently a hot topic, so it was no surprise that it was featured in almost a third of all the talks at this year’s PyData conference.

Representing Booking.com, Emrah Tasli and Stas Girkin dived into the complex problem of image understanding. Emrah showed how Booking.com’s unique corpus of millions of tagged photos enables us to train a deep convolutional neural network specialised to output image labels that are relevant to our exact problem. Stas took us through the technical details of scaling this to work for our millions of users daily, and how to test the direct benefits to our customers via A/B testing.

The range of Deep Learning topics covered at the PyData conference was very broad, and really gave us a sense of just how powerful this tool can be. For example, Mark-Jan Harte talked us through an application in the medical domain with his inspiring talk on “Training a TensorFlow model to detect lung nodules on CT scans”. Dafne van Kuppevelt covered a wide range of applications in her talk “Deep learning for time series made easy”, namely ecology/classifying bird activity, movement sensing/classifying human activity, and classifying epilepsy from EEG. As the title of the talk suggests, it was certainly refreshing to see a more beginner-friendly talk on the subject.

Diagnosing statistical models

Ever wonder why all your coefficients in your linear model turn up insignificant? Wonder no more! Lucas Bernardi shared some of his pragmatic Data Science tricks to diagnose statistical models in a clear-cut way. He elaborated on one of the possible reasons for the insignificance of coefficients: features that are not independent of each other (multicollinearity). As Lucas stressed, this problem should be tackled especially when the main goal is understanding and interpreting the model, rather than focusing on accurate predictions. He explained how to use a clustered correlation plot to find and deal with the multicollinearity of features in explanatory models.

The second topic Lucas covered was monitoring and diagnosing a classification model that is used in production. As an example, he chose the “Business vs Leisure” model used on the Booking.com website. In short: when a user does not indicate in the search box whether they are travelling for leisure or business, we still want to predict the probability that they are a business booker. In order to optimise the user experience, we might show different versions of our website depending on what this model predicts. The challenges that could occur in this live environment is that our data could be:

incomplete (not all the data ends up labelled, so there’s no way to evaluate all data against a ground truth);
delayed (the visitor might book only some time later);
dynamic (the label and feature space distributions change over time).

In this real word scenario, how can we monitor model performance, and diagnose any trouble? Lucas advocated the use of “Response Distribution Analysis”, which means looking at the the probability distribution of the model output over all of the presented examples. You could also call this the distribution of the probabilities of the probability to be in the positive class. Ideally, we want this to be a bimodal distribution, and use the “valley” between the peaks as the threshold value. To learn about the interpretation of more patterns in the response distribution, watch the recording at:

NLP on heavy metal lyrics

In a full schedule of 2 days of talks from 9 to 6, we were happy to have a few lightweight, fun and far-out talks too! Jon Paton showed us how English looks to non-English people in his talk on character level Markov models, “Simulate your language. ish.” Another talk in this line was Rogier van der Geer’s “Risk Analysis”. Contrary to what any financial analysts in the audience might have hoped for, we learned from Rogier’s talk how to win the board game Risk using genetic algorithms. Even one of the keynotes had a fun edge; in his presentation “Python versus Orangutan”, Dirk Gorissen shared his experiences with using Python to train drones to find orangutans in the rainforests of Borneo.

For Booking.com, Iain Barr showed that our Data Scientists don’t just care about holiday travel. Iain explained how he applied NLP to the song lyrics of metal bands. We’ll never forget his definitions of “metal-ness” of a word:

It turns out that using this simple idea — of comparing each word’s frequency in metal lyrics to its frequency in normal English — gives a pretty good measure of what we’d intuitively mean by “metal-ness”. So: the most metal word in the English language is ‘burn’, closely followed by ‘cries’, ‘veins’, and ‘eternity’. Want to know the least metal words? If you’re particularly (hint) interested then you can relatively (hint) easily check out Iain’s talk:

More PyData

You can find recordings of all talks from PyData Amsterdam 2017 here. Overall, the PyData Amsterdam 2017 conference was a great success and a learning experience for us. We learned a lot about Data Science and Python, hosting and organising a conference — and we had a lot of fun too. Here’s to PyData Amsterdam 2018!

PyData at Booking.com: Deep Learning, Statistical Models and NLP

Deep Learning

Diagnosing statistical models

NLP on heavy metal lyrics

More PyData

Written by Melanie Mueller