Recommendation Engines

Maya Alexandera
4 min readNov 11, 2020

Recently, I completed my first beginner-level React/Rails e-commerce application and in my retrospective, I have decided to work on some of the features that didn’t quite make the cut. The one feature that I promised myself I would come back to is a ‘style-quiz’. In my research, I quickly crossed paths with recommendation engines- which I found to be both very fascinating and worth sharing. Here, I will be sharing my findings; first of which will be the three main types of recommendations, ending with the recommendation process.

What does into a good recommendation engine? pie chart
source: blogs.sas.com

Types of Recommendations

Something I found quite useful right off the bat was the organization of the existing recommendation eco-system. In my initial brainstorming session I had the beginnings of the following types of recommendations written down but ultimately decided to build upon the existing groundwork.

There are three main types of recommendations: collaborative filtering, content-based filtering, and then hybrid recommendation systems- which utilize both collaborative and content-based strategies.

Collaborative filtering is based on collecting and analyzing information based on the behaviors, activities, and preferences of a given user and predicts what they will like based on the similarity with other users. What’s nice about the collaborative approach is that it doesn’t rely on any machine analysis since products are chosen based on leveraging users’ behavior — which allows you to make accurate and complex recommendations without a lot of data about the items themselves. Comparing the likeness of two users (user-to-user filtering) is very effective, however, a major drawback is the amount of time and resources it requires to compare each user-pair to find appropriate recommendations. An alternative would be to compare items rather than users. This requires fewer resources and is widely used, including large companies like Amazon.

Content-based filtering is based on the comparison between a product description of an item with the profile of a user’s preferences. This requires the user model to be built in a way to state the type of item the user likes using keywords that are matched with those of a given product’s description. The primary concept of content-based filtering is if a user likes one item, they will also respond to a ‘similar’ item. The drawback of content-based filtering arises when attempting to implement the same filter to translate preferences from one product type to another, such as from news to books. It’s easy to imagine why this would dramatically diminish the value of the recommendation system if it is unable to work across different product categories.

The Hybrid Solution

A combination of the previous two filtering methods — the hybrid recommendation strategy is fairly self-evident. It utilizes both collaborative and content-based filtering, executes them separately, and then combines them using a compiler to produce the final recommendation. The clearest example of a hybrid-based system is Netflix, whose personalized recommendation-system is estimated to be worth around $1 billion per year and about 80% of all views are from the recommendation engine.

Recommendation Process

So far we’ve covered the two different types of filtering methods, as well as the hybrid approach to filtering. Now, let's look into the actual process and the ‘how’ of a recommendation engine.

We have actually already covered the final step of the recommendation process - filtering. The remaining three steps, in order, are collection, storing, and analyzing.

The first step in creating a recommendation engine is gathering data — lots of it. This data can be either explicit or implicit. In the context of e-commerce, explicit would be data provided by the user and is typically some sort of expression of opinion — things like product reviews, ratings, and comments. Implicit data in this context would be a user's order/return history, cart events, and search history. For each user created a dataset will be generated and linked with that particular user. Implicit data can also be thought of as behavior data, which is easy to collect since it requires no additional action on the users part, and will improve in accuracy over time as it ‘learns’ about the user. The downside is that this data is harder to analyze, and is generally less useful than explicit feedback collected from the user.

Storage

Not to state the obvious, but the more information you make available to your filtering algorithm, the better the resulting recommendations will be. I am saving this topic for more research since it ventures into another area I am excited to learn more about — databases. From what I know now, basically, the type of data that you’re storing informs your decision on the type of database you implement.

Analyzing

An important factor in deciding which analysis method to use is the point in time when the recommendations are made available to the user. In order of immediacy- real time analysis requires tools that can process streams of events. Companies such as SAS offer these types of services. Near real time allows data to be gathered quickly and can be refreshed every few minutes or so and lastly is batch analysis, which requires enough data to make relevant recommendations — useful for situations like sending an email at a later date.

--

--