Kupondole-10, Lalitpur, Nepal 9840143772 infographytech9@gmail.com

Basic of Recommendation System

The purpose of a recommender system is to suggest relevant items to users. To achieve this task, there exist two major categories of methods : collaborative filtering methods and content based methods.

Collaborative filtering methods

Collaborative methods for recommender systems are methods that are based solely on the past interactions recorded between users and items in order to produce new recommendations. These interactions are stored in the so-called “user-item interactions matrix”.

Then, the main idea that rules collaborative methods is that these past user-item interactions are sufficient to detect similar users and/or similar items and make predictions based on these estimated proximities.


The class of collaborative filtering algorithms is divided into two sub-categories that are generally called memory based and model based approaches. Memory based approaches directly works with values of recorded interactions, assuming no model, and are essentially based on nearest neighbours search (for example, find the closest users from a user of interest and suggest the most popular items among these neighbours). Model based approaches assume an underlying “generative” model that explains the user-item interactions and try to discover it in order to make new predictions.


Content based methods

Unlike collaborative methods that only rely on the user-item interactions, content based approaches use additional information about users and/or items. If we consider the example of a movies recommender system, this additional information can be, for example, the age, the sex, the job or any other personal information for users as well as the category, the main actors, the duration or other characteristics for the movies (items).




Then, the idea of content based methods is to try to build a model, based on the available “features”, that explain the observed user-item interactions. Still considering users and movies, we will try, for example, to model the fact that young women tend to rate better some movies, that young men tend to rate better some other movies and so on. If we manage to get such model, then, making new predictions for a user is pretty easy: we just need to look at the profile (age, sex, etc.) of this user and, based on this information, to determine relevant movies to suggest.


Basic of Reinforcement Learning


Reinforcement learning (RL) is the area of machine learning that deals with sequential decision-making.A key aspect of RL is that an agent learns a good behavior. This means that it modifies or acquires new behaviors and skills incrementally. Another important aspect of RL is that it uses trial-and-error experience(as opposed to e.g., dynamic programming that assumes full knowledge of the environment a priori). Thus, the RL agent does not require complete knowledge or control of the environment; it only needs to be able to interact with the environment and collect information.


Formal framework


The reinforcement learning setting

The general RL problem is formalized as a discrete time stochastic control process where an agent interacts with its environment in the following way: the agent starts, in a given state within its environment s0 ∈ S, by gathering an initial observation ω0 ∈ Ω. At each time step t, the agent has to take an action at ∈ A. As illustrated in Figure below, it follows three consequences: (i) the agent obtains a reward rt ∈ R, (ii) the state transitions to st+1 ∈ S, and (iii) the agent obtains an observation ωt+1 ∈ Ω.




Screen Shot 2020-03-06 at 8.59.28 AM

Fig.1 agent-environment interaction in RL

The Markov property

For the sake of simplicity, let us consider first the case of Markovian

stochastic control processes

Definition: – . A discrete time stochastic control process is Markovian

(i.e., it has the Markov property) if

  • P(ωt+1 | ωt, at) = P(ωt+1 | ωt, at, . . . , , ω0, a0), and


  • P(rt| ωt, at) = P(rt| ωt, at, . . . , , ω0, a0).


The Markov property means that the future of the process only depends on the current observation, and the agent has no interest in looking at the full history.

A Markov Decision Process (MDP) (Bellman, 1957a) is a discrete time stochastic control process defined as follows:

Definition :-. An MDP is a 5-tuple (S, A, T, R, γ) where:

  • S is the state space,
  • A is the action space,
  • T : S × A × S → [0, 1] is the transition function (set of conditional transition probabilities between states),
  • R : S ×A×S → R is the reward function, where R is a continuous set of possible rewards in a range Rmax ∈ R+ (e.g., [0, Rmax]),


  • γ ∈ [0, 1) is the discount factor.


By : Anku Jaiswal

Get In Touch

Kupondole-10, Lalitpur, Nepal



© InfographyTechnologies. All Rights Reserved.