Bol.com is an online retail portal based out of the Netherlands. They sell over 8 million products, stocking everything from entertainment to electronics, from books to jewellery. Recommending the perfect product from this vast range can be like finding a needle in a haystack; we caught up with Barrie Kersbergen, Bol.com’s Expert Software Engineer at Berlin Buzzwords to find out the science behind the perfect recommendation.
Could you start by telling us a little bit more about yourself, your company and your work.
I’m Barrie Kersbergen, and I’m an Expert Software Engineer at bol.com. I have a background in Computer Science; I worked at the University of Utrecht for 12 years, where I was developing mathematical educational software at the Freudenthal Institute. In 2010, I thought it was time for something completely different, so joined bol.com, as part of the team which specialises in personalising content for customers. One of the systems I helped to design and built was the recommender system that aims towards personalizing content and recommending the ‘right’ products, in other words products that inspire customers.
And how do you recommend the right products?
I’m not claiming that I have the ultimate answer, but I have developed some strategies for recommending and inspiring customers. Most data scientists carry out offline evaluations- so they test how well they’re able to accurately predict behaviour through their algorithms compared to the true behaviour of the customers on their site. This has its advantages but if you’re able to predict something, the customer on the site probably already had the intent of doing that anyway, because they came to the site for a specific reason. So if I recommend a book to you which you were already going to buy, the recommendation adds no value.
At bol.com we carry out such offline evaluations because we want to know how accurate our predictions are, but we also do long-term analysis in order to find out what the impact is of the items that we show you in the long term, that is in multiple visitor sessions. What we certainly want to know is: Do visitors return in the long term?
Another thing we do is live-user experiments. Behind the scenes, without visitors knowing, we are experimenting to see if new algorithms and parameters are performing better, and adding more value to what our visitors are doing right now on our website.
What technologies are you using for this?
We use Hadoop for batches, and we have our own custom-built technology for the real-time part. We’re also evaluating new technologies which could add value for us in terms of doing real-time calculations. The difficulty with real-time is that we only have a limited time window in which we can recommend items to the users. What we can’t do is say to the visitors “Please wait five minutes, then we’ll calculate the ultimate product for you”. We only have a few milliseconds- approximately 80 milliseconds to do custom made recommendations. In this time span we have to calculate the most relevant recommendations and decide how to present them to the customer. This is something we are currently doing with custom technology, and in the future we want to do deeper analysis using more data, and we’re looking into which technology we should use for this. At the moment, Storm and Cassandra seem like viable options.
How important is visualisation for your customers? What work have you been doing to optimise this?
I tend to think it is important- if we don’t show a product image, and just show a box with “No Image”, that’s not a good idea. But the optimal visualisation is also in itself a kind of recommendation. So for some customers, we show really technical details, because we have concluded with statistical analysis that showing technical details really adds value to that customer’s experience. But some other customers get confused by all of the details- they might think “I just wanted to buy a pair of pants, why do I need to see all of these details? I don’t know what they mean, it makes me nervous!” So if you get the balance between showing the right combination of product attributes right, it can really add alot of value. This is something we’ve proven this using live-user experiments.
So aside from getting the recommendations in real-time, what are some of the other challenges faced by bol.com in terms of getting the recommendations out there?
The main challenge is the whole puzzle of personalising the full website. We can recommend almost anything; we can recommend products, but we can also recommend search queries. We can also recommend categories that might be interest to you and we can recommend you trending items within these specific categories.
There’s also the issue of how often you should repeat a recommendation. It might be a missed opportunity; you may have missed the recommendation because you were looking at other things, and didn’t notice it. How should we interpret this? Does it mean the recommendation is flawed? Or does it just mean that the visitor haven’t looked at it? In my opinion it’s ok to repeat the recommendation once in a while, but you shouldn’t overdo it because then it could get really annoying to the customer. So managing this is the tricky part.
So what we’re basically doing is recommending algorithms to you, and the outcome of the algorithms are shown to you. Then, we measure the effect on you as a customer. So if you’re clicking on and interacting with these recommendations we’re probably doing a good job, because we’re inspiring you and adding value to your website journey. But if you’re not interacting then we’re not doing such a good job, and it might be time to switch strategy, and try something new.
Trial and error?
Though experimentation we improve our software, this could be seen as a form trial and error, however it’s not entirely random. But this means we sometimes do things we think are irrational, but may add value. For instance, just showing a random recommendation algorithm on your profile and seeing if you like the outcome.
Has this yielded any surprising successes?
I’d love to say that it has! But sometimes we’re not entirely sure what the real intent of the customer is, therefore it is difficult to measure whether a change of strategy is more or less successful. Lots of visitors are anonymous to us so we have little information about who they are, and we don’t know what their intent is on our website. So we need to figure all of these things about in a few milliseconds’ time, which is quite challenging. So what we don’t want to do is do lots of calculations and miss our window of opportunity, because then the customer will see fallback content. So we want to make sure that we get the deadline.
Moving forward, what is bol.com working on right now?
What I want to do in the future is have more data available to do realtime analytics on. Why did specific systems show content to you on our website, for instance, what did the search engine need to do to show the search result? Because all of the metadata involved tells me something about why you were shown these specific products and attributes. Combining all of this data opens up new opportunities for us- I don’t know what these opportunities will be, but we will definitely incorporate this.
This means that we will need to have more computational power, which is why we need a real-time framework that simplifies working on larger datasets in milliseconds and help us out with the low-level infrastructure. Because what we don’t want to do is spend alot of time of the low-level infrastructure, because it’s not our core business- it’s fun from a technical perspective but it doesn’t add value for what we’re trying to do. We’re trying to inspire and personalise content for our customers.
Moving forward in terms of the larger picture, where do think big data is headed?
Big data is just there, it’s just available for everyone. So we started doing using big data in 2009, it was hi-tech somewhat unstable and it was all new and nobody knew what to expect from this. Now it’s all commoditised. Everyone is using this in our company, without even knowing that it’s special- we just use the technology without even caring that it’s supposed to be special; for us it’s just mainstream. Within 3 years, every company will be using this technology; it will be accepted as the norm next to the relational databases.
That’s interesting to hear; alot of people are talking about “Big data taking over”, but at bol.com it sounds like it was just part of the process.
Yeah, it was just part of the process- it gives us new abilities, it’s affordable technology, but distributed computing is very old. We call big data “commercially-applied distributed computing”; it’s nothing special. It’s been around for 30-40 years. However, now it’s affordable- even with a small budget you can install a new cluster and start doing distributed computing. Also, the technologies themselves are much easier to use- in the past, you really needed expert computer scientists to develop this software; now, your average Joe could program this and do these things for you.
Bol.com is an online retail portal based out of the Netherlands. They sell over 8 million products, stocking everything from entertainment to electronics, from books to jewellery. Established in 1999 with 26 employees, it’s since grown to 650 employees and become one of the largest European online retailers.