#MachineLearning #RecommendationEngine
By Billy Gustave
Business challenge/requirement:
BookRent is the largest online and offline book rental chain in India. TheCompany charges a fixed fee per month plus rental per book. So,the company makes more money when user rent more books.
You as an ML expert have to model recommendation engine so that user gets recommendation of books based on the behaviorof similar users. This will ensure that users are renting books based on their individual taste.
Company is still unprofitable and is looking to improve both revenue and profit.Goal :
import numpy as np, pandas as pd
df_user_bx_rtng = pd.read_csv('BX-Book-Ratings.csv', encoding='ansi', nrows=10000)
print(df_user_bx_rtng.shape)
df_user_bx_rtng.head()
df_bx = pd.read_csv('BX-Books.csv', encoding='ansi', low_memory=False)
print(df_bx.shape)
df_bx.head()
df_user = pd.read_csv('BX-Users.csv', encoding='ansi', low_memory=False)
print(df_user.shape)
df_user.head()
We won't be using BX-Users data since it doesn't have any information valuable for this project.
We will use book with matching ISBN IDs:
# merge the two data on ISBN
df = pd.merge(df_user_bx_rtng,df_bx,on='isbn')
print(df.shape)
df.head()
Checking for missing values
df.info()
No missing values
Train-Test-Split
from sklearn.model_selection import train_test_split
train, test = train_test_split(df, random_state=7)
Rating Matrix
# get users and books
n_users = df.user_id.unique().shape[0]
n_books = df.isbn.unique().shape[0]
print('Num. of Users: '+ str(n_users))
print('Num of Books: '+str(n_books))
# I will use a dataframe instead since isbn is alphanumeric
# and conver to numpy array after fillig the matrix
train_to_convert = pd.DataFrame(columns=df.isbn.unique(),index=df.user_id.unique()).fillna(0)
test_to_convert = pd.DataFrame(columns=df.isbn.unique(),index=df.user_id.unique()).fillna(0)
# fill in our new matrix
for line in train.itertuples():
train_to_convert.loc[line[1], line[2]] = line[3]
for line in test.itertuples():
test_to_convert.loc[line[1], line[2]] = line[3]
# convert to array
train_matrix = train_to_convert.values
test_matrix = test_to_convert.values
print(train_matrix)
print(test_matrix)
Creating Similarities
from sklearn.metrics import pairwise_distances, mean_squared_error
user_similarity = pairwise_distances(train_matrix, metric='cosine')
book_similarity = pairwise_distances(train_matrix.T, metric='cosine')
Prediction
# user prediction
mean_user_rating = train_matrix.mean(axis=1)[:,np.newaxis]
ratings_diff = (train_matrix - mean_user_rating)
user_pred = mean_user_rating + user_similarity.dot(ratings_diff)/np.array([np.abs(user_similarity).sum(axis=1)]).T
# book prediction
book_pred = train_matrix.dot(book_similarity)/np.array([np.abs(book_similarity).sum(axis=1)])
Error test
test = test_matrix[test_matrix.nonzero()].flatten()
user_pred = user_pred[test_matrix.nonzero()].flatten()
book_pred = book_pred[test_matrix.nonzero()].flatten()
from math import sqrt
print('User rmse: ', sqrt(mean_squared_error(user_pred,test)))
print('Book rmse: ', sqrt(mean_squared_error(book_pred,test)))
About same resuts for both
UBCF
and
IBCF
5000 :
1000
: