I’ve previously written a lot on data mining in the abstract; now, I want to start taking you through some practical applications. Welcome to the fascinating world of the recommendation engine- this post will walk through the concepts, and later posts will teach you how to implement your own.
What we will learn:
I’ll begin our tour by answering four basic questions:
- What is a recommendation engine?
- What is the difference between real life recommendation engine and online recommendation engines?
- Why should we use recommendation engines?
- What are the different types of recommendation engines?
What is a Recommendation Engine ?
Wiki Definition: Recommendation Engines are a subclass of information filtering system that seek to predict the ‘rating’ or ‘preference’ that user would give to an item.
dataaspirant Definition: Recommendation Engine is a black box which analysis some set of users and shows the items which a single user may like.
Offline Recommendation Engines
In the external world, we can think of the people around us as recommendation engines.
- Your family and friends as clothes recommendation engines: With the thousands of style options now available to us, we often rely on friends and family to recommend stores, styles and tell us what looks good on us.
- Your Professors and book recommendation engines: When want to research or better understand a concept, our Professors can lead us to the titles which best suit our needs
- Your friends as movie recommendation engines: If you have friends who know your cinematic tastes well, you’re likely to trust their movie recommendations over a random stranger’s picks.
Notice that all of these “offline recommenders” know something about you. They know your style, taste or area of study, and thus can make more informed decisions about what to recommendations would benefit you most. It is this personalisation- based on getting to “know” you- that online recommenders aim to emulate.
Online Recommendation Engines
Facebook: “People You May Know”
Facebook users a recommender system to suggest Facebook users you may know offline. The system is trained on personal data mutual friends, where you went to school, places of work and mutual networks (pages, groups, etc.), to learn who might be in your offline & offline network.
Netflix: “Other Movies You Might Enjoy”
When you fill out your Taste Preferences or rate movies and TV shows, you’re helping Netflix to filter through the thousands of selections to get a better idea of what you might like to watch. Factors that Netflix algorithm uses to make such recommendations include:
- The genre of movies and TV shows available
- Your streaming history, and previous ratings you’ve made.
- The combined ratings of all Netflix members who have similar tastes in titles to you.
LinkedIn: “Jobs You May be Interested In”
The Jobs You May Be Interested In feature shows jobs posted on LinkedIn that match your profile in some way. These recommendations shown based on the titles and descriptions in your previous experience, and the skills other users have “endorsed”.
Amazon: “Customers Who Bought This Item Also Bought…
Amazon’s algorithm crunches data on all of its millions of customer baskets, to figure out which items are frequently bought together. This can lead to huge returns- for example, if you’re buying an electrical item, and see a recommendation for the cables or batteries it requires beneath it, you’re very likely to purchase both the core product and the accessories from Amazon.
Why Should We Use Recommendation Engines?
In the immortal words of Steve Jobs: “A lot of times, people don’t know what they want until you show it to them.” Customers may love your movie, your product, your job opening- but they may not know it exists. The job of the recommender system is to open the customer/user up to a whole new products and possibilities, which they would not think to directly search for themselves.
What Are the Different Types of Recommendation Engines?
Let me introduce you to three very important types of recommender systems:
- Collaborative Filtering
- Content-Based Filtering
- Hybrid Recommendation Systems
Collaborative filtering methods are based on collecting and analyzing a large amount of information on users’ behaviors, activities or preferences and predicting what users will like based on their similarity to other users. A key advantage of the collaborative filtering approach is that it does not rely on machine analyzable content and therefore it is capable of accurately recommending complex items such as movies without requiring an “understanding” of the item itself. Many algorithms have been used in measuring user similarity or item similarity in recommender systems. For example, the k-nearest neighbor (k-NN) approach and the Pearson Correlation.
Content Based Filtering
Content-based filtering methods are based on a description of the item and a profile of the user’s preference. In a content-based recommendation system, keywords are used to describe the items; beside, a user profile is built to indicate the type of item this user likes. In other words, these algorithms try to recommend items that are similar to those that a user liked in the past (or is examining in the present). In particular, various candidate items are compared with items previously rated by the user and the best-matching items are recommended. This approach has its roots in information retrieval and information filtering research.
Recent research has demonstrated that a hybrid approach, combining collaborative filtering and content-based filtering could be more effective in some cases. Hybrid approaches can be implemented in several ways, by making content-based and collaborative-based predictions separately and then combining them, by adding content-based capabilities to a collaborative-based approach (and vice versa), or by unifying the approaches into one model. Several studies empirically compare the performance of the hybrid with the pure collaborative and content-based methods and demonstrate that the hybrid methods can provide more accurate recommendations than pure approaches. These methods can also be used to overcome some of the common problems in recommendation systems such as cold start and the sparsity problem.
Netflix is a good example of a hybrid system. They make recommendations by comparing the watching and searching habits of similar users (i.e. collaborative filtering) as well as by offering movies that share characteristics with films that a user has rated highly (content-based filtering).
I hope you liked today’s post. In the next installment, we’re going to learn about these three recommendation systems in the bigger picture, and learn how to implement them. Any questions? Leave a comment below.
Featured Image Credit: clasesdeperiodismo / Foter / CC BY-SA
Body Image Credits: Yuriy Trubitsyn / dataaspirant
Original source can be found here.
Hi, great article thank you – can’t wait for the next installment.
You mention “common problems in recommendation systems such as cold start and the sparsity problem” – will these (and others) be covered in future installments?
Nice overview. This early in the process we’re all still pretty surprised just to see a list of the things we actually say, do, and like. The predictive aspect of this doesn’t even have to impress, and indeed doesn’t, but many users are just now getting the idea that there is a rigorous way somehow to observe and track their behavior in a way that will result in being able to find what they’re looking for. Systems that allow you to audit their choices are helpful for user education. Once semantic web gets going in earnest and online information is ordered even more formally than it already is, we’ll reach the next plateau, and after that, who knows?
Great article for a starter! Cheers..