Pages

Thursday, September 4, 2014

Movie Recommender System using recommenderlab

Recommendation system is used in day to day life. It is used in book search, online shopping, movie search, social networking, to name a few. Recommendation system applies statistical and knowledge discovery techniques to provide recommendation to new item to the user based on previously recorded data. The recommendation information can be used to increase customer retention, promote cross-selling, and add value to buyer-seller relationship.

Broadly recommender systems are classified into two categories:

  • Content based: recommending items that shares some common attributes based on user preferences
  • Collaborative filtering: recommending item from users sharing common preferences.

Commonly used metrics to quantify the performace of recommender systems are Root Mean Squared Error (RMSE), precision and Recall.

R has a nice package recommenderlab that provides infrastructure to develop and test recommender algorithm. recommenderlab focusses on recommender algorithm based on collaborative filtering.

I used recommenderlab to get insight into collaborative filtering algorithms and evalaute the performace of different algorithm available in the framework on Movie Lens 100k dataset. The dataset is downloaded from here.

###### Recommender System algorithm implementaion on Movie Lens 100k data ###

## load libraries ####
library(recommenderlab)
library(reshape2)


# Load Movie Lens data
dataList<- readData()
# data cleansing and preprocessing
ratingDF<- preProcess(dataList$ratingDF, dataList$movieDF)
# create movie rating matrix
movieRatingMat<- createRatingMatrix(ratingDF)
# evaluate models
evalList <- evaluateModels(movieRatingMat)
## RANDOM run 
##   1  [0.01sec/0.47sec] 
## POPULAR run 
##   1  [0.04sec/0.09sec] 
## UBCF run 
##   1  [0.02sec/20.99sec]

The plot for comparing “Random”, “Popular”, “UBCF” based recommender algorithm is shown:

# plot evaluation result
visualise(evalList)

plot of chunk unnamed-chunk-3 plot of chunk unnamed-chunk-3

The visualisation shows “UBCF” algorithm has highest precision. So I picked “UBCF” to predicts top 10 recommendation of user with userID = 1.

## on visualization, looks like UBCF has highest precision.
# get Confusion matrix for "UBCF"
getConfusionMatrix(evalList[["UBCF"]])[[1]][,1:4]
##        TP      FP    FN   TN
## 1  0.4316  0.5579 50.80 1602
## 3  1.3684  1.6000 49.86 1601
## 5  2.0000  2.9474 49.23 1600
## 10 3.6632  6.2316 47.57 1597
## 15 4.9368  9.9053 46.29 1593
## 20 6.0947 13.6947 45.14 1589
## run "UBCF" recommender
rec_model <- createModel(movieRatingMat, "UBCF")
userID <- 1
topN <- 5
recommendations(movieRatingMat, rec_model, userID, topN)
## [[1]]
## [1] "Glory (1989)"             "Schindler's List (1993)" 
## [3] "Close Shave, A (1995)"    "Casablanca (1942)"       
## [5] "Leaving Las Vegas (1995)"

The complete R code can be found here.