UNIT – 5: RECOMMANDED SYSTEM
COLABORATIVE FILTERING
In collaborative filtering, we ignore the features of an individual item. Instead, we focus on a similar
group of people using the item and recommend other items that the group likes.
Similar users are divided into small clusters and are recommended new items according to the
preferences of that cluster. Let’s understand this with an easy movie recommendation example:
What we can infer from this user-item matrix is:
➢ Users 1 and 2 liked Movie 1. Since User 1 liked movies 2 and 4 a lot, there’s a high chance of
User 2 enjoying the same.
➢ Users 1 and 3 have opposite tastes.
➢ Users 3 and 4 both disliked Movie 2, so there’s a high chance User 4 will also dislike Movie
4.
➢ User 3 might dislike Movie 1.
Types of collaborative filtering: The two types of collaborative filtering approaches are:
➢ Memory-based collaborative approach
➢ Model-based collaborative approach
1. Memory-based collaborative approach: In memory-based collaborative filtering, only the user-
item interaction matrix is utilized to make new recommendations to users. The whole process is based
on the users’ previous ratings and interactions. Memory-based filtering consists of 2 methods: user-
based collaborative filtering and item-based collaborative filtering.
(i) User-based collaborative filtering: To suggest new recommendations to a particular user, a
group of similar users (nearest neighbors) is created based on the interactions of the reference user.
The items that are most popular in this group, but new to the target user, are used for the suggestions.
,(ii) Item-based collaborative filtering: In item-based filtering, new recommendations are selected
based on the old interactions of the target user. First, all the items that the user has already liked are
considered. Then, similar products are computed and clusters are made (nearest neighbors). New
items from these clusters are suggested to the user.
2. Model-based collaborative approach: In the model-based approach, machine learning models are
used to predict and rank interactions between users and the items they haven’t interacted with yet.
These models are trained using the interaction information already available from the interaction
matrix by deploying different algorithms like matrix factorization, deep learning, clustering, etc.
(i) Matrix factorization: Matrix factorization is used to generate latent features by decomposing the
sparse user-item interaction matrix into two smaller and dense matrices of user and item entities.
Since not all the movies are viewed and rated by every user, we end up with a sparse matrix. To
create a model for our matrix, we can assume that:
➢ There exist some latent features that can differentiate between good and bad movies.
➢ These features can help us understand user choices (higher the value, higher the preference).
We do not provide these features explicitly, but let the model discover the useful features and make
its user and item matrices. As the features are learned and not provided, they have mathematical
correlation and meaning but no intuitive understanding.
, CONTENT BASED FILTERING
As the name suggests, content-based filtering is a
Machine Learning implementation that uses Content or
features gathered in a system to provide similar
recommendations. The most relevant information is
fetched from the dataset based on user observations.
Content based filtering is a recommendation algorithm
to find similar suggestions. Here, every unique value in
a dataset is assigned keywords or attributes which help
them to be recognized. Then based on these patterns,
the information about the user's likes and dislikes is
saved, recommending relevant items.
Method to Perform Content based Filtering
1. Identification of attributes and features - Based on the search results, browses, and purchases, an
inventory of attributes or features is compiled.
2. Feature Matrix - Feature matrix maps products and their features and assigns them a numerical or
a binary value based on the resemblance to the searched product. This sets up the basis for accepting
the product for recommendation or rejecting it.
3. Judging acceptance or rejection - Either the binary values assigned to the dot product vector
decides if the product is to be considered. A higher value shows acceptance, and a more inferior one
shows rejection.
Content Filtering Using Item Data: Item-based Content filtering assesses each attribute of every
Item in the feature matrix and recommends items according to it. Pandas is a Python library that
assists in calculating the matrix values for the recommendation.
The similarity between two products is calculated as the cosine of two items -
(A, B) = ABCosϴ
The predicted rating is calculated like this -
Predicted Rating = (∑ User rating * Similarity) ÷ ∑(Similarity)
COLABORATIVE FILTERING
In collaborative filtering, we ignore the features of an individual item. Instead, we focus on a similar
group of people using the item and recommend other items that the group likes.
Similar users are divided into small clusters and are recommended new items according to the
preferences of that cluster. Let’s understand this with an easy movie recommendation example:
What we can infer from this user-item matrix is:
➢ Users 1 and 2 liked Movie 1. Since User 1 liked movies 2 and 4 a lot, there’s a high chance of
User 2 enjoying the same.
➢ Users 1 and 3 have opposite tastes.
➢ Users 3 and 4 both disliked Movie 2, so there’s a high chance User 4 will also dislike Movie
4.
➢ User 3 might dislike Movie 1.
Types of collaborative filtering: The two types of collaborative filtering approaches are:
➢ Memory-based collaborative approach
➢ Model-based collaborative approach
1. Memory-based collaborative approach: In memory-based collaborative filtering, only the user-
item interaction matrix is utilized to make new recommendations to users. The whole process is based
on the users’ previous ratings and interactions. Memory-based filtering consists of 2 methods: user-
based collaborative filtering and item-based collaborative filtering.
(i) User-based collaborative filtering: To suggest new recommendations to a particular user, a
group of similar users (nearest neighbors) is created based on the interactions of the reference user.
The items that are most popular in this group, but new to the target user, are used for the suggestions.
,(ii) Item-based collaborative filtering: In item-based filtering, new recommendations are selected
based on the old interactions of the target user. First, all the items that the user has already liked are
considered. Then, similar products are computed and clusters are made (nearest neighbors). New
items from these clusters are suggested to the user.
2. Model-based collaborative approach: In the model-based approach, machine learning models are
used to predict and rank interactions between users and the items they haven’t interacted with yet.
These models are trained using the interaction information already available from the interaction
matrix by deploying different algorithms like matrix factorization, deep learning, clustering, etc.
(i) Matrix factorization: Matrix factorization is used to generate latent features by decomposing the
sparse user-item interaction matrix into two smaller and dense matrices of user and item entities.
Since not all the movies are viewed and rated by every user, we end up with a sparse matrix. To
create a model for our matrix, we can assume that:
➢ There exist some latent features that can differentiate between good and bad movies.
➢ These features can help us understand user choices (higher the value, higher the preference).
We do not provide these features explicitly, but let the model discover the useful features and make
its user and item matrices. As the features are learned and not provided, they have mathematical
correlation and meaning but no intuitive understanding.
, CONTENT BASED FILTERING
As the name suggests, content-based filtering is a
Machine Learning implementation that uses Content or
features gathered in a system to provide similar
recommendations. The most relevant information is
fetched from the dataset based on user observations.
Content based filtering is a recommendation algorithm
to find similar suggestions. Here, every unique value in
a dataset is assigned keywords or attributes which help
them to be recognized. Then based on these patterns,
the information about the user's likes and dislikes is
saved, recommending relevant items.
Method to Perform Content based Filtering
1. Identification of attributes and features - Based on the search results, browses, and purchases, an
inventory of attributes or features is compiled.
2. Feature Matrix - Feature matrix maps products and their features and assigns them a numerical or
a binary value based on the resemblance to the searched product. This sets up the basis for accepting
the product for recommendation or rejecting it.
3. Judging acceptance or rejection - Either the binary values assigned to the dot product vector
decides if the product is to be considered. A higher value shows acceptance, and a more inferior one
shows rejection.
Content Filtering Using Item Data: Item-based Content filtering assesses each attribute of every
Item in the feature matrix and recommends items according to it. Pandas is a Python library that
assists in calculating the matrix values for the recommendation.
The similarity between two products is calculated as the cosine of two items -
(A, B) = ABCosϴ
The predicted rating is calculated like this -
Predicted Rating = (∑ User rating * Similarity) ÷ ∑(Similarity)