Bhrigu Events¶
From Wikipedia:
Maharishi Bhrigu was one of the seven great sages, the Saptarshis, one of the many Prajapatis (the facilitators of Creation) created by Brahma (The God of Creation), the first compiler of predictive astrology.
This system is named after this great saint Bhrigu who is the first one to introduce predictive analysis. So, the vision of this entire system is to analyse the user behaviour and predict his/her next actions and deliver customised content. And all of this done in real-time.
Event Ingestion¶
Any user’s event data has to recorded in the system and to ingest the data into the system there are following 3 methods by means of which you can track the events.
- Client side tracking using pixel
- Server side tracking using gRPC
- Mobile App tracking using POST method
Client Side Tracking (pixel.gif)¶
To track the events from different hosts and browsers you need to use the following Tracking URL along with their respective hosts.
https://www.carwale.com/bhrigu/pixel.gif?t=1499424326318&cat=CATEGORY_VALUE &lbl=LABEL_VALUE&act=ACTION_VALUE&pi=PAGEURL&ref=REFERRER_VALUE
Note
Please find platforms and corresponding bhrigu tracking hosts.
Platform | Tracking Host |
---|---|
CarWale | https://bhrigu.carwale.com/ (or) https://www.carwale.com/bhrigu/ |
CarWale Staging/Testing | https://bhrigustg.carwale.com/ (or) https://staging.carwale.com/bhrigu/ |
BikeWale | https://bhrigu.bikewale.com/ (or) https://www.bikewale.com/bhrigu/ |
BikeWale Staging/Testing | https://bhrigustg.bikewale.com/ (or) https://staging.bikewale.com/bhrigu/ |
Nomenclature¶
There are certain notations we have defined to track any generic event data.
t
(mandatory for Client Side tracking)- It denotes the current timestamp value so that browsers doesn’t cache the response and send every request to the server. Otherwise many events would be missed due to browser caching of
pixel.gif
image. This is not needed in case of Server Side tracking using gRPC.
cat
(mandatory)- It denotes the Category to broadly categorise the particular kind of events, ex: To track all the events on Quotation Page category value would be QuotationPage
lbl
(mandatory)- It denotes the label, a field where you can send any custom data as Key-Value pairs in the format like
lbl=modelid=1003|versionid=4542|cityid=1
. If there is no data to be sent then just sendlbl=NA
act
(optional)- It denotes the action. So, for any event you want to track in bhrigu their combination of Category and Action should be unique
pi
(optional)- It denotes the current page URL of the event (Shouldn’t include query string in page URL)
qs
(optional)- It denotes the query string of the current page of the event. Format of
qs
isqs=a=1|b=2|c=3
.&
should be replaced by|
ref
(optional)- It denotes the referrer page URL (CarWale/BikeWale URL path) value with respect to the current event
src
(optional)- It denotes the source of the session. In ideal case, this should be sent in any event whose document referrer is a 3rd party URL
Warning
- If you are tracking events from client side (ie., Web Browser) make sure
_cwv
is set for either .carwale.com or www.carwale.com domains (respective domain for bikewale)
Server side tracking (gRPC)¶
To track events from server you need to implement the rpc TrackEvent
method which takes input message of TrackingRequest
type.
See proto file here.
Nomenclature¶
cookieid
(mandatory for server side tracking)- Unique identifier of the user
sessionid
(mandatory)- Unique identifier for that session of the user
category
(mandatory)- Identifier to categorise the events broadly
action
(optional)- It denotes the action. As mentioned earlier combination of Category and Action should be unique.
label
(mandatory)- In this you can send any custom data as Key-Value pairs in the format like
modelid=1003|versionid=4542|cityid=1
. If there is no data to be sent then just sendNA
pageurl
(optional)- Current page URL of the event (Should not include query string of the page)
querystring
(optional)- Query string of the current page of the event
cookie
- optional if CookieId and SessionId is sent else
_cwv
cookie passsed from the end user should be forwared in this field
clientip
(optional)- IP address of the end user should be forwared in this field
useragent
(optional)- user agent of the end user should be forwared in this field
referrer
(optional)- Previous page URL of the current event
Mobile App tracking (POST method)¶
One request for each event from mobile would use a lot of network and battery. So, to reduce that we have implemented one POST method where you can send multiple events using single POST request. Now as metioned for Client and Server side tracking you need to send the CookieId and SessionId which is mandatory to identify a unique user and his/her session.
Note
Applications and corresponding bhrigu tracking URL’s.
Platform | Tracking Host |
---|---|
CarWale App | https://www.carwale.com/bhrigu/events/ |
CarWale App Testing | https://staging.carwale.com/bhrigu/events/ |
BikeWale App | https://www.bikewale.com/bhrigu/events/ |
BikeWale App Testing | https://staging.bikewale.com/bhrigu/events/ |
Nomenclature¶
Cookie (Header)
(mandatory)Request header should contain the unique identifier of the APP user in the
cid
followed by;<space>
as the seperator, similarlysid (SessionId)
,pf (Platform)
andappver (appversion)
should be sent as key-value pair in this header.Example :
cid=sajTzxiq5L32ridFAP5AHJMyY; sid=Ba3Xjx; pf=43; appver=4.3.2
cat
(mandatory)Identifier to categorise the events broadly
act
(optional)It denotes the action. As mentioned earlier combination of Category and Action should be unique.
lbl
(mandatory)It denotes the label, a field where you can send any custom data as Key-Value pairs in the format like
modelid=1003|versionid=4542|cityid=1
. If there is no data to be sent then just sendNA
pi
(optional)It denotes the current screen name of the event
ref
(optional)Previous screen name of the current event
ts
(mandatory)There are two UnixTimestamp, root level UnixTimestamp denotes when the request was finally posted on the server. Whereas the other
ts
along with each event denotes when user has perfomed the event
{
"ts": 1526553927021,
"events": [{
"cat": "AppNotification",
"act": "Impression",
"lbl": "id=1234|title=Baleno Facelift|<Key>=<Value>",
"ts": 1526553790566
},
{
"cat": "AppNotification",
"act": "Click",
"lbl": "id=1234|title=Baleno Facelift|<Key>=<Value>",
"ts": 1526553801917
},
{
"cat": "CompareCars",
"act": "Comparison",
"lbl": "modelid=1003,1003|versionid=4545,4547|source=21|isorganic=0|<Key>=<Value>",
"pi": "<CurrentScreenName>",
"ref": "<PreviousScreenName>",
"ts": 1526553808717
},
{
"cat": "Images",
"act": "ImageView",
"lbl": "makename=Volkswagen|modelname=Polo|modelid=1003|timespent=7069|imageid=95634|<Key>=<Value>",
"pi": "<CurrentScreenName>",
"ref": "<PreviousScreenName>",
"ts": 1526553817277
},
{
"cat": "Images",
"act": "ImageView",
"lbl": "makename=Volkswagen|modelname=Polo|modelid=1003|timespent=7069|imageid=95635|imagetype=360|<Key>=<Value>",
"pi": "<CurrentScreenName>",
"ref": "<PreviousScreenName>",
"ts": 1526553821182
},
{
"cat": "Videos",
"act": "VideoView",
"lbl": "makename=Volkswagen|modelname=Polo|modelid=1003|versionid=7069|videoid=95635|<Key>=<Value>",
"pi": "<CurrentScreenName>",
"ref": "<PreviousScreenName>",
"ts": 1526553826797
},
{
"cat": "Reviews",
"act": "ExpertReview",
"lbl": "makename=Volkswagen|modelname=Polo|modelid=1003|versionid=7069|reviewid=25134|<Key>=<Value>",
"pi": "<CurrentScreenName>",
"ref": "<PreviousScreenName>",
"ts": 1526553841150
},
{
"cat": "News",
"act": "ExpertReview",
"lbl": "makename=Volkswagen|modelname=Polo|modelid=1003|title=Audi A4 launch|newid=3456|<Key>=<Value>",
"pi": "<CurrentScreenName>",
"ref": "<PreviousScreenName>",
"ts": 1526553849391
}
]
}
Usage¶
This system is made to use as OLAP (Online analytical processing). So, you can use this system to track
- Any user activity
- Product changes impact
- Performance related KPI
- Log data, etc.
Morever, same data can be queried (BigQuery) and visualised in real time (Events Tracking).
Conventions¶
As this is a generic event logging system, you are recommended to follow some conventions
- Key-Value pairs in
lbl
should be=
(preferred) or:
seperated. Moreover, multiple Key-Value pairs should be|
seperated. Ex:lbl=modelid=1003|versionid=4542|cityid=1
- Make sure you send both
cid
andsid
in case of Mobile App or Server side tracking- Use Pascal case for Category, Action
- In case of server side tracking, make sure to forward Header
User-Agent
as the information like Browser, OS, Device etc will be retrived from this- In case of server side tracking, send user IP in Header
Client-IP
otherwise for all these events server’s IP will be recordedsrc
value should be 3rd party URL of document referrer of the current page if user comes from outside or direct if user comes directly to website. Hence forward if document referrer is our own website you will need to send theref
parameter which is the referrer url of the current event- For referrer, just send relative url (Ex:
/m/audi-cars
)- In any event, either
src
orref
will be present but not both
Storage¶
Data is being sent to the API server but this data should be stored in a persistent storage. Also, the rate of event inflow is very high so relational databases doesn’t work in this case as write latency will be high. So, for better write latency we have used NoSQL database.
Components¶
For storage we have used open source technology Apache Cassandra as NoSQL database.
Schema¶
The tables stored in cassandra database have following schemas:
eventdata
orstgeventdata
:Data ingested from
bhrigu.carwale.com
orbhrigu.bikewale.com
is stored ineventdata
table and for stagingstgeventdata
.Column DataType Format Example partitionkey string YYYY-MM-DD-HH-X 2017-11-18-15-2 category string SomeCategory QuotationPage cookieid string SomeUniqueID BMYXeGeNeL8J8txIovMVKd8iz sessionid string SomeSessoinID pB6IhoY5Aw logdatetime bigint unixtimestamp 1499281733000 label string key1=val1|key2=val2 NewVersion:1934|OldVersion:1933 action string SomeAction VersionChange browser string Browser Name Chrome browserversion string Browser Version 57.0.2987.132 cookie string c1:v1; c2:v2 CWC:155DyWrzWGcglLiTEm4c2ZaPp; _cwv:155DyWrzWGcglLiTEm4c2ZaPp.6bgghlJKP .1499452517.1499452517.1499452517.1 ip string xxx.xxx.xxx.xxx 169.149.134.21 isbot boolean true/false false ismobile boolean true/false true os string OS Name Android 6.0.1 pageinfo string Page URL https://www.carwale.com/m/tata-cars/ platform string Device Info Linux query string key1=val1|key2=val2 makeId=45|modelId=560|versionId=5268 referrer string Referrer URL /m/new rendorengine string Rendor Engine Name AppleWebKit rendorengineversion string Rendor Engine Version 537.36 source string Source direct useragent string User Agent Raw String Mozilla/5.0 (X11; Linux x86_64)
dailyuserprofilescw
ordailyuserprofilesbw
:This table has day-wise data of userprofiles generated from event data for CW & BW
Column DataType Format Example logdate string YYYY-MM-DD 2017-06-25 cookieid string SomeUniqueID BMYXeGeNeL8J8txIovMVKd8iz userdata bigmap {Key1:{k1:v1,k2:v2}, Key2:{k3:v3,k4,v4}} {‘models’:{‘amaze’:3,’nano’:2},’bodystyle’: {‘sedan’:3,’hatchback’:2}}
agguserprofilescw
oragguserprofilesbw
:This table has aggregated data of userprofiles over a period of 90 days generated from event data for CW & BW
Column DataType Format Example cookieid string SomeUniqueID BMYXeGeNeL8J8txIovMVKd8iz userdata map<text,map <text,int>> {Key1:{k1:v1,k2:v2}, Key2:{k3:v3,k4,v4}} {‘models’:{‘amaze’:3,’nano’:2},’bodystyle’: {‘sedan’:3,’hatchback’:2}} userpreference map<text,text> {Key1:Val1,Key2:Val2 Key3:Val3} {‘models’:’amaze,nano’,’bodystyle’: {‘sedan,hatchback’,’carpreference’:’new’}
Processing¶
Storing data in NoSQL databases is easy but what is more challening is to process it.
Components¶
We have used Apache Spark for processing the data.
Apache Spark™ is a fast and general engine for large-scale data processing. Spark Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk. Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python and R shells.
Pipelines¶
We have developed a system for querying the big data using SQL query. We’ll discuss more in detail further, see here. There are two pipelines for data processing.
- Batch Processing
- Generally for creating reports and batch analysis
- Real-Time Processing
- Required for real-time calculation of metrics
Event Tracking¶
We have developed a panel on CW-OPR for monitoring events that come to us. This will help POs, TLs and others to:
- View Events
- Add new Events
- Update Event details
Event tracking system is created to regulate events being sent to Bhrigu. There are many unwanted events that are sent to bhrigu which can be moved to other sources like GA, DBLogging etc. Now, events being sent to bhrigu will only be accepted if they are approved. Otherwise, they will be discarded. The OPR panel creates a system which enables users to add/update events and send them for approval.
The complete logical flow of Event Tracking is shown below.
Some information regarding fields required for an event is shown below:
Field Name | Type | Optional | Unique |
---|---|---|---|
Category | String | No | Yes |
Action | String | Yes | Yes |
Label Keys | List | No |
For a detailed explanation on which event should be sent to Bhrigu,GA,DBLogging etc, please read the guidelines here.
System Architecture¶
- To better explain the architecture:
- Events from different sources are received by Bhrigu Service.
- Bhrigu Service sends these events to Kafka.
- Kafka is basically a streaming platform which can easily handle huge amounts of data in real-time.
- Bhrigu Consumer consumes this data from Kafka, keeps approved events and discards unapproved events.
- These events are then stored in Cassandra.
- CW OPR makes use of these events from Cassandra and helps user Add, Update, etc events on the go.
Any event addition/updation goes through an approval process as shown below.
There are 3 roles given on the OPR side. Each role has been given certain permissions:
Role Name | Can View? | Can Add/Update? | Can Approve/Reject? |
---|---|---|---|
event-tracking-viewer | Yes | No | No |
event-tracking-manager | Yes | Yes | No |
event-tracking-authorizer | Yes | Yes | Yes |
Adding an Event¶
Before adding an event, make sure that you read the guidelines.
There are 3 things required for adding an event:
Field Name | Type | Optional | Unique |
---|---|---|---|
Category | String | No | Yes |
Action | String | Yes | Yes |
Label Keys | List | No |