Recommendation System¶

Example of how recommendations look on Carwale

Recommendations are shown on every model page under the heading Alternatives for CarWale and Similar Bikes for BikeWale. For every model, all the other model are sorted according to their recommendation_score in descending order. This sorted list is what appears on the final sites.

The recommendation_score is made up of two parts - collaborative_score and content_based_score

Collaborative Filtering¶

Collaborative filtering is used to calculate the collaborative_score. Collaborative filtering is a technique that can filter out items that a user might like on the basis of reactions by similar users.

To find the collaborative score, we first find sets of users who have -

compared cars,

seen a cars price quote and

seen models

and store them in tables. Let’s call these tables compareUsersList, pqUsersList and modelUsersList. Each one of these tables would look like

list(model, list(users))

Eg.

model1	users
m1	usr1, usr2, usr4, usr5, usr6
m2	usr1, usr2, usr7, usr8
m3	usr1

Then for each unique pair of rows in compareUsersList, which would look like

model1 list(users) associated with model1 model2 list(users) associated with model2

m1 usr1, usr2, usr4, usr5, usr6 m2 usr1, usr2, usr7, usr8

m2 usr1, usr2, usr7, usr8 m3 usr1

m1 usr1, usr2, usr4, usr5, usr6 m3 usr1

Let’s call list(users) associated with model1 as l1 and with model2 as l2.

For each such pair of rows, jc which is (10 ^ jaccard coefficient of l1, l2) and commonCount which is (common count of l1, l2) are calculated.

Where jaccard coefficient of l1, l2 is

(intersection of l1, l2)/(union of l1, l2)

and common count of l1, l2 is

number of users in (intersection of l1, l2)

These scores along with the pair of models are stored in another table called compareScores. This is repeated for pqUsersList and modelUsersList and the results are stored in tables pqScores and modelScores respectively.

Each one of these tables would look like

list(model1, model2, jc, commonCount)

Eg. compareScores would look like -

model1 model2 jc commonCount

m1 m2 10^(2/7) = 1.93 2

m2 m3 10^(1/4) = 1.77 1

m1 m3 10^(1/5) = 1.58 1

Next compareScores, pqScores and modelScores are joined such that model1 from compareScores matches the model1 from pqScores which matches the model1 from modelScores and model2 from compareScores matches the model2 from pqScores which matches the model2 from modelScores

One row of this table would look like -

model1 model2 compareScores.jc compareScores.commonCount pqScores.jc pqScores.commonCount modelScores.jc modelScores.commonCount

m1 m2 1.93 2 1.9 3 1.6 2

m2 m3 1.77 1 1.6 1 1.6 1

m1 m3 1.58 1 1.5 1 1.3 1

from each row we find a value which is called similarity.

where similarity is 0.6*compareScores.jc + 0.25*pqScores.jc + 0.15*modelScores.jc

The similarity is stored along with the models in a table called finalMatrix, which would look like -

list(model1, model2, similarity)

Eg.

model1	model2	similarity
m1	m2	0.61.93 + 0.251.9 + 0.15*1.6 = 1.87
m2	m3	0.61.77 + 0.251.6 + 0.15*1.6 = 1.70
m1	m3	0.61.58 + 0.251.5 + 0.15*1.3 = 1.52

We then find the average similarity and the standard deviation of the similarity of the whole finalTable and store them in avgSimilarity and stdDevSimilarity These scores are used to find the collaborative_score -

collaborative_score = (similarity - avgSimilarity)/stdDevSimilarity

Eg.

stdDevSimilarity = 0.142 avgSimilarity = 1.696

model1	model2	collaborative_score
m1	m2	1.225
m2	m3	0.028
m1	m3	-1.239

In this way the collaborative_score for each pair of models is found.

Content Based Filtering¶

In content based filtering, we create a description for every model. Then for every pair of models in the system, a score is determined based on the similarity of their descriptions, this score is called the content_based_score.

Each raw description contain the following keywords -

Keywords Used in Carwale Used in Bikewale

Unique fuel types of all the versions of a model Yes Yes

Whether any version is Electric Yes Yes

Displacement of the default variant No Yes

Segment of the model Yes Yes

Whether the model is New or Upcoming Yes Yes

BodyStyle of the model Yes Yes

Average price of the model Yes Yes

Performance of the model Yes Yes

Comfort of the model Yes Yes

Value for money for the model Yes Yes

Model’s Visual appeal Yes Yes

Service experience for the model No Yes

Reliability of the model No Yes

Fuel economy rating Yes No

Prices of all the versions of the model Yes Yes

Maximum price of the model Yes Yes

Minimum price of the model Yes Yes

Unique transmission types Yes No

Seating Capacity Yes No

Sub segment type Yes No

Eg. -

'FuelType': 'Petrol',
'Electric': '',
'Displacement': 346.0,
'ModelName': 'Bullet 350',
'MakeName': 'Royal Enfield',
'Segment': 'Executive commuter',
'isNew': True,
'isUpcoming': False,
'BodyStyle': 'Cruiser',
'AvgPrice': 151659,
'Performance': 4.4956,
'Reliability': 4.4558,
'Comfort': 4.6098,
'ServiceExperience': 4.2997,
'ValueForMoney': 0.0,
'VisualAppeal': 4.6482,
'PriceList': '145072, 151659, 160374',
'MinPrice': 145072,
'MaxPrice': 160374,
'Id': 81

But the frequency of these keywords in the final description depends on feature weights set by the Product team. If the bodystyle weight is set to 2 and segment weight is set to 1, the body style keyword will be repeated twice in the description while the segment keyword will appear only once.

The feature weights for Carwale are -

Name	Weight
bodyStyle	4
carBucket	1
carTransmission	1
electric	2
fuelType	2
priceRange	2
seatingCapacity	3
segment	4
subSegment	2

The feature weights for Bikewale are -

Name	Weight
BodyStyle	12
AvgPrice	4
Segment	3
BikeBucket	1
FuelType	2
Performance	1
Reliability	1
Comfort	1
ServiceExp	1
ValueForMoney	1
VisualAppeal	1
Displacement	3
Electric	2

Note

Prices are not directly used, they are first put into buckets and then compared. Ex. if a price is ₹9,64,000, it might be put in the ₹9,50,000 - ₹9,80,000 bucket. The previous and the next ranges are also included to eliminate the chance of two similarly priced vehicles not having any ranges in common This is also done for displacement.

Final description example -

Adventure
Adventure
Adventure
Adventure
Adventure
Adventure
AvgPrice_153044
AvgPrice_181052
AvgPrice_201639
AvgPrice_228858
AvgPrice_287279
... 6 times
PremiumBike
... 3 times
Premium
Petrol
... 2 times
performance_5
reliability_5
comfort_5
serviceexperience_5
visualappeal_5
Displacement_249
Displacement_293
Displacement_342
Displacement_408
Displacement_649
... 3 times

Next, for each pair of models, the final descriptions are compared using TF-IDF to get a content_based_score.

Combining scores¶

The content_based_scores and collaborative_scores are multiplied by weights provided by the Product team and a final score is found for each pair of models.

Score Boosting¶

This process of score boosting adds user flavour to the generated recommendations. Instead of recommending every user with same list of items for a given item, we can target each user based on his/her preferences inferred from the user profile.

The list of recommendations contains an item and its corresponding score. And each item has some attributes (Ex: bodystyle, price, etc.,). A user profile contains his/her preferences for each of these attributes.

We define an Affinity function which takes input arguments of user profile, boosting parameters (BodyStyle, PriceBucket, SubSegment), and an item. This function returns a score which denotes the affinity of the user towards the given input item.

AF(UserProfile, BoostingParams, Item) = Affinity Score

And this affinity score is added to the original score of the item. So after boosting the scores of every item, the list is reordered to give final personalised recommendations.

Overrides¶

Carwale

The models with seating capacity of 5 have their seatingCapacity weight increased by 3. So if the seatingCapacity feature weight is 4, for models with seating capacity 5 will have it set to 4 + 3 i.e. 7.

Bikewale

Body styles other than Street, Naked, Scooter, Cruiser have their BodyStyle feature weight set to W/2 instead of W. So if the BodyStyle weight is 10, it will remain 10 for Street, Naked, Scooter, Cruiser body styles. But for all other body styles it will be 5.