Recommendation System¶
Recommendations are shown on every model page under the heading Alternatives for CarWale and Similar Bikes for BikeWale. For every model, all the other model are sorted according to their recommendation_score in descending order. This sorted list is what appears on the final sites.
The recommendation_score is made up of two parts - collaborative_score and content_based_score
Collaborative Filtering¶
Collaborative filtering is used to calculate the collaborative_score. Collaborative filtering is a technique that can filter out items that a user might like on the basis of reactions by similar users.
To find the collaborative score, we first find sets of users who have -
- compared cars,
- seen a cars price quote and
- seen models
and store them in tables. Let’s call these tables compareUsersList, pqUsersList and modelUsersList. Each one of these tables would look like
list(model, list(users))
- Eg.
model1 users m1 usr1, usr2, usr4, usr5, usr6 m2 usr1, usr2, usr7, usr8 m3 usr1
Then for each unique pair of rows in compareUsersList, which would look like
model1 list(users) associated with model1 model2 list(users) associated with model2 m1 usr1, usr2, usr4, usr5, usr6 m2 usr1, usr2, usr7, usr8 m2 usr1, usr2, usr7, usr8 m3 usr1 m1 usr1, usr2, usr4, usr5, usr6 m3 usr1
Let’s call list(users) associated with model1 as l1 and with model2 as l2.
For each such pair of rows, jc which is (10 ^ jaccard coefficient of l1, l2) and commonCount which is (common count of l1, l2) are calculated.
Where jaccard coefficient of l1, l2 is
(intersection of l1, l2)/(union of l1, l2)
and common count of l1, l2 is
number of users in (intersection of l1, l2)
These scores along with the pair of models are stored in another table called compareScores. This is repeated for pqUsersList and modelUsersList and the results are stored in tables pqScores and modelScores respectively.
Each one of these tables would look like
list(model1, model2, jc, commonCount)
Eg. compareScores would look like -
model1 model2 jc commonCount m1 m2 10^(2/7) = 1.93 2 m2 m3 10^(1/4) = 1.77 1 m1 m3 10^(1/5) = 1.58 1
Next compareScores, pqScores and modelScores are joined such that model1 from compareScores matches the model1 from pqScores which matches the model1 from modelScores and model2 from compareScores matches the model2 from pqScores which matches the model2 from modelScores
One row of this table would look like -
model1 model2 compareScores.jc compareScores.commonCount pqScores.jc pqScores.commonCount modelScores.jc modelScores.commonCount m1 m2 1.93 2 1.9 3 1.6 2 m2 m3 1.77 1 1.6 1 1.6 1 m1 m3 1.58 1 1.5 1 1.3 1
from each row we find a value which is called similarity.
where similarity is0.6*compareScores.jc + 0.25*pqScores.jc + 0.15*modelScores.jc
The similarity is stored along with the models in a table called finalMatrix, which would look like -
list(model1, model2, similarity)
- Eg.
model1 model2 similarity m1 m2 0.6*1.93 + 0.25*1.9 + 0.15*1.6 = 1.87 m2 m3 0.6*1.77 + 0.25*1.6 + 0.15*1.6 = 1.70 m1 m3 0.6*1.58 + 0.25*1.5 + 0.15*1.3 = 1.52
We then find the average similarity and the standard deviation of the similarity of the whole finalTable and store them in avgSimilarity and stdDevSimilarity These scores are used to find the collaborative_score -
collaborative_score = (similarity - avgSimilarity)/stdDevSimilarity
- Eg.
stdDevSimilarity = 0.142 avgSimilarity = 1.696
model1 model2 collaborative_score m1 m2 1.225 m2 m3 0.028 m1 m3 -1.239
In this way the collaborative_score for each pair of models is found.
Content Based Filtering¶
In content based filtering, we create a description for every model. Then for every pair of models in the system, a score is determined based on the similarity of their descriptions, this score is called the content_based_score.
Each raw description contain the following keywords -
Keywords Used in Carwale Used in Bikewale Unique fuel types of all the versions of a model Yes Yes Whether any version is Electric Yes Yes Displacement of the default variant No Yes Segment of the model Yes Yes Whether the model is New or Upcoming Yes Yes BodyStyle of the model Yes Yes Average price of the model Yes Yes Performance of the model Yes Yes Comfort of the model Yes Yes Value for money for the model Yes Yes Model’s Visual appeal Yes Yes Service experience for the model No Yes Reliability of the model No Yes Fuel economy rating Yes No Prices of all the versions of the model Yes Yes Maximum price of the model Yes Yes Minimum price of the model Yes Yes Unique transmission types Yes No Seating Capacity Yes No Sub segment type Yes No
Eg. -
'FuelType': 'Petrol',
'Electric': '',
'Displacement': 346.0,
'ModelName': 'Bullet 350',
'MakeName': 'Royal Enfield',
'Segment': 'Executive commuter',
'isNew': True,
'isUpcoming': False,
'BodyStyle': 'Cruiser',
'AvgPrice': 151659,
'Performance': 4.4956,
'Reliability': 4.4558,
'Comfort': 4.6098,
'ServiceExperience': 4.2997,
'ValueForMoney': 0.0,
'VisualAppeal': 4.6482,
'PriceList': '145072, 151659, 160374',
'MinPrice': 145072,
'MaxPrice': 160374,
'Id': 81
But the frequency of these keywords in the final description depends on feature weights set by the Product team. If the bodystyle weight is set to 2 and segment weight is set to 1, the body style keyword will be repeated twice in the description while the segment keyword will appear only once.
The feature weights for Carwale are -
Name | Weight |
bodyStyle | 4 |
carBucket | 1 |
carTransmission | 1 |
electric | 2 |
fuelType | 2 |
priceRange | 2 |
seatingCapacity | 3 |
segment | 4 |
subSegment | 2 |
The feature weights for Bikewale are -
Name | Weight |
BodyStyle | 12 |
AvgPrice | 4 |
Segment | 3 |
BikeBucket | 1 |
FuelType | 2 |
Performance | 1 |
Reliability | 1 |
Comfort | 1 |
ServiceExp | 1 |
ValueForMoney | 1 |
VisualAppeal | 1 |
Displacement | 3 |
Electric | 2 |
Note
Prices are not directly used, they are first put into buckets and then compared. Ex. if a price is ₹9,64,000, it might be put in the ₹9,50,000 - ₹9,80,000 bucket. The previous and the next ranges are also included to eliminate the chance of two similarly priced vehicles not having any ranges in common This is also done for displacement.
Final description example -
Adventure
Adventure
Adventure
Adventure
Adventure
Adventure
AvgPrice_153044
AvgPrice_181052
AvgPrice_201639
AvgPrice_228858
AvgPrice_287279
... 6 times
PremiumBike
... 3 times
Premium
Petrol
... 2 times
performance_5
reliability_5
comfort_5
serviceexperience_5
visualappeal_5
Displacement_249
Displacement_293
Displacement_342
Displacement_408
Displacement_649
... 3 times
Next, for each pair of models, the final descriptions are compared using TF-IDF to get a content_based_score.
Combining scores¶
The content_based_scores and collaborative_scores are multiplied by weights provided by the Product team and a final score is found for each pair of models.
Score Boosting¶
This process of score boosting adds user flavour to the generated recommendations. Instead of recommending every user with same list of items for a given item, we can target each user based on his/her preferences inferred from the user profile.
The list of recommendations contains an item and its corresponding score. And each item has some attributes (Ex: bodystyle, price, etc.,). A user profile contains his/her preferences for each of these attributes.
We define an Affinity function
which takes input arguments of user profile, boosting parameters (BodyStyle, PriceBucket, SubSegment), and an item. This function returns a score which
denotes the affinity of the user towards the given input item.
AF(UserProfile, BoostingParams, Item) = Affinity Score
And this affinity score is added to the original score of the item. So after boosting the scores of every item, the list is reordered to give final personalised recommendations.
Overrides¶
- Carwale
- The models with seating capacity of 5 have their seatingCapacity weight increased by 3. So if the seatingCapacity feature weight is 4, for models with seating capacity 5 will have it set to 4 + 3 i.e. 7.
- Bikewale
- Body styles other than Street, Naked, Scooter, Cruiser have their BodyStyle feature weight set to W/2 instead of W. So if the BodyStyle weight is 10, it will remain 10 for Street, Naked, Scooter, Cruiser body styles. But for all other body styles it will be 5.