Recommender System Literature Review 2019–2023

8 min readFeb 5, 2024

In the ever-expanding realm of digital information, recommendation systems have become the unsung heroes that guide us through the vast seas of content. Whether it’s suggesting our next favorite movie, helping us discover a new book, or recommending products tailored to our preferences, these systems play a pivotal role in shaping our online experiences.

This post is to share a summarization of a literature review I recently read on the recommender system. The author strictly select the recent research from 2019 to 2023 which focusing on novel approaches, evaluation techniques and common problems we might face in building the recommender system.

Reference:
I. Saifudin and T. Widiyaningtyas, “Systematic Literature Review on Recommender System: Approach, Problem, Evaluation Techniques, Datasets,” in IEEE Access, doi: 10.1109/ACCESS.2024.3359274.

Background

The paper select 72 researches which focus on different approaches, where Collaborative Filtering (CF) was the major approach and used in 46 studies, Content-Based Filtering (CBF) and Hybrid Filtering (HBF) were used in 11 and 15 studies respectively.

Figure 1: Distribution of References by Year (taken from [1])

The following four research questions served as the foundational framework for the entire literature review (1):

What approach is suitable for solving problems in recommender systems?
What problems arise in solving problems in recommender systems?
What evaluation techniques are suitable for solving problems in recommender systems?
What datasets are currently popular for research in recommender system?

Approach

The study separate the recommender system built by the 72 researches into 3 major category — Content Based Filtering (CBF), Collaborative Filtering (CF) and Hybrid Filtering (HBF). The author emphasized three associated studies for each approach, and I will only delve into one from each category.

Content Based Filtering

The Content-Based Filtering (CBF) technique leverages product descriptions and user profiles, conducting product characteristics analysis to generate recommendations. User profiles are shaped by both interaction history and individual preferences. CBF research spans diverse domains like movies and tourist attractions, utilizing a similarity metric in vector space to calculate item similarity for precise recommendations. While CBF stands out for its flexibility, not relying on additional user information, it demands domain expertise for optimal results.

CBF’s reliance on product features and user profiles, coupled with its independence from changing user information, underscores its flexibility compared to other methods. However, this approach comes with the caveat of requiring domain-specific skills to achieve optimal performance, showcasing the dual nature of advantages and challenges within the realm of CBF research.

In the study “Movie Popularity and Target Audience Prediction Using the Content-Based Recommender System”, S. Sahu et al. (2) proposed a multiclass CNN movie recommendation model. There are three major components in the proposed model.

Content-based recommendation model
Movie hit prediction model
Audience targeting model

Figure 2: Content-based recommendation model (taken from [2])

The first module will preprocess the data and create a multi-dimensional KNN cluster based on the movie meta data and cosine similarity. The result will be passed into the second module to predict movie rating using 1D-CNN module. The audience targeting model will generate recommendations based on the user age group and biography.

Utilizing a multiclass classification model, researchers have attained an impressive accuracy of 96.8%, surpassing benchmarks and emphasizing the capabilities of predictive and prescriptive data analysis in bolstering industrial decisions within information systems. Nonetheless, the literature reviewer acknowledges certain limitations of the model, particularly the necessity for an expert system to predict a movie’s probability of success with reasonable accuracy, a requirement recognized by both moviemakers and researchers in the field.

Reference:
S. Sahu, R. Kumar, M. S. Pathan, J. Shafi, Y. Kumar, and M. F. Ijaz, “Movie Popularity and Target Audience Prediction Using the Content-Based Recommender System,” IEEE Access, vol. 10, pp. 42030–42046, 2022, doi: 10.1109/ACCESS.2022.3168161.

Collaborative Filtering

Different from Content based filtering approach, Collaborative Filtering (CF) approach utilize the utility matrix based on the user to item interaction to estimate the user preference, and the item profile is necessary required. Within collaborative filtering, the collective input of all users is considered, identifying individuals with comparable tastes to suggest new and tailored products to the main customer. This approach is assuming people with similar item interaction will have similar taste.

Figure 4: similar profile user will be used to predict the result

CF is a powerful approach in recommender system, however, this approach often suffer from the challenge of execution time and scalability. Matrix decomposition algorithm becomes essential in this method. T. Widiyaningtyas et al. (3) is able to process the MovieLens 100k dataset within 13.502 second by utilizing the SVD-WPR approach they proposed.

The study integrated the Singular Value Decomposition (SVD) method to predict unrated items from the user-item utility matrix. To optimize the time complexity, the rating aggregation method, Weighted Point Rating (WPR), is utilized to obtain the recommendation items.

WPR process consists of four steps:

Calculating the equal rating items
Calculating item points
Calculating weight points
Calculating weight points rating

The rating aggregation method is brilliant and very effective, however, we are not going into the math formula detail at the moment.

Reference:
T. Widiyaningtyas, M. I. Ardiansyah, and T. B. Adji, “Recommendation Algorithm Using SVD and Weight Point Rank (SVD-WPR),” Big Data and Cognitive Computing, vol. 6, no. 4, 2022, doi: 10.3390/bdcc6040121.

Hybrid Filtering

Unlike singular methods such as Content-Based or Collaborative Filtering, Hybrid Filtering leverages a combination of approaches, often incorporating both user-item interactions and item characteristics. H. Tahmasebi et al. (4) implemented the recommender system based on autoencoder neural network.

In the research, deep learning models in recommender systems are categorized into two main types: neural network models and deep hybrid models. The former includes diverse architectures such as multilayer perceptron (MLP), autoencoder (AE), convolutional neural network (CNN), recurrent neural network (RNN), restricted Boltzmann machine (RBM), neural autoregressive distribution estimation (NADEs), attentional model (AM), adversarial network (AN), and deep reinforcement learning (DRL). On the other hand, hybrid models combine two or more deep learning techniques, utilizing the flexibility of deep neural networks to create a more robust recommendation system. The range of options showcases the adaptability and versatility of deploying deep learning in recommender systems.

The study indicates that the autoencoder network recommender system is able to maintain high accuracy and labeled data is not required. These networks excel at automatically extracting valuable features from the data they process. By reconstructing data and diminishing data dimensions through learned relations, autoencoders prove to be efficient in uncovering meaningful patterns and representations without the need for explicit labeling.

The author proposed an autoencoder neural network approach — SRDNet (Social Recommender Deep Autoencoder Network), which consists of two recommender engine, collaborative filtering and content based filtering. The model gathers data from MovieTweetings, Twitter and IMDB to built social influence item profile based on movie detail, number of tweets, number of retweets and number of likes.

The user-movie utility matrix is constructed using the social influence score. Therefore, user ratings and interests with a higher social influence result in a more substantial impact on the training process of the deep autoencoder network. The autoencoder is mainly used in the CF module and the final recommendation result is calculated based on addition of different ratio of the collaborative filtering and content based filtering modules.

Reference:
H. Tahmasebi, R. Ravanmehr, and R. Mohamadrezaei, “Social movie recommender system based on deep autoencoder network using Twitter data,” Neural Comput Appl, vol. 33, no. 5, pp. 1607–1623, 2021, doi: 10.1007/s00521–020–05085–1.

Problem

The author addressed 5 major problems which the researches faced.

Accuracy
Sparsity
Cold Start
Running Time
Scalability

The findings from the research highlight several crucial challenges in building recommendation systems. The Cold Start Problem represents a significant hurdle, where the system lacks sufficient information about a new user or item, degrading overall performance. The Sparsity Problem arises due to limited user assessments, leading to sparse matrices, difficulty finding resemblances, and weak recommendations. Scalability becomes a concern as regular computing grows linearly with users and items, impacting the system’s ability to produce satisfactory recommendations with an increased dataset. To address this, dimension reduction techniques like Singular Value Decomposition (SVD) are proposed. The Running Time Problem emphasizes the need for algorithmic improvements to enhance the speed of recommendations, and finally, the Accuracy Problem is a recurring challenge. The majority of issues lie in Collaborative Filtering, followed by Content-Based Filtering and Hybrid Filtering, with ongoing efforts to enhance accuracy through diverse research methods. The study underscores the dynamic nature of recommendation system research, where addressing these challenges is an ongoing pursuit for perfection.

Figure 5: Distribution of problem found in studies (taken from [1])

Evaluation Method

The recommender system is a complex system which can involve a lot of different approaches, dataset and contextual information for the user’s profile or item details, however, the recommendation result is based on the user preference and there is no absolute answer in practice. Evaluation methods become an essential question to all recommendation researches.

Based on the research, the majority of the researches are using RMSE and MAE.

Figure 6: Distribution of Evaluation Methods (taken from [1])

Probability theory offers a logical resolution to the choice between RMSE and MAE, with one of these metrics proving optimal when applied correctly, although practical scenarios may necessitate the use of multiple metrics. Enhancements in model refinement, data transformation, the adoption of robust statistics, or the construction of improved probabilities can enhance results in specific cases, with the latter being the most versatile option, albeit pragmatic considerations may favor other approaches. Additionally, it is highlighted that RMSE is optimal for normally distributed errors, countering the misconception that MAE is exclusive to uniformly distributed errors. Despite MAE’s greater power, there are superior alternatives, and both metrics lack theoretical justification and fail to introduce the extensive literature on the subject.

Dataset

Recommendation systems are often tailored to specific problems, for example, book, music or even stock recommendation. Various datasets sometimes contain distinct contextual information, the recommendation system is not generic to all problems most of the time.

Among 72 researches in the literature review, 44 of the studies are using the Movielens dataset.

Final words

The author highlighted a lot of recommender system studies, many of them proposed novel approaches, however, accuracy is still the major concern across the recommender system researches.

Thank you for reading, and have a nice day!
Please let me know if there is acknowledgement concern in the post.

Reference

I. Saifudin and T. Widiyaningtyas, “Systematic Literature Review on Recommender System: Approach, Problem, Evaluation Techniques, Datasets,” in IEEE Access, doi: 10.1109/ACCESS.2024.3359274.
S. Sahu, R. Kumar, M. S. Pathan, J. Shafi, Y. Kumar, and M. F. Ijaz, “Movie Popularity and Target Audience Prediction Using the Content-Based Recommender System,” IEEE Access, vol. 10, pp. 42030–42046, 2022, doi: 10.1109/ACCESS.2022.3168161.
T. Widiyaningtyas, M. I. Ardiansyah, and T. B. Adji, “Recommendation Algorithm Using SVD and Weight Point Rank (SVD-WPR),” Big Data and Cognitive Computing, vol. 6, no. 4, 2022, doi: 10.3390/bdcc6040121.
H. Tahmasebi, R. Ravanmehr, and R. Mohamadrezaei, “Social movie recommender system based on deep autoencoder network using Twitter data,” Neural Comput Appl, vol. 33, no. 5, pp. 1607–1623, 2021, doi: 10.1007/s00521–020–05085–1.