Bhrigu Events

From Wikipedia:

Maharishi Bhrigu was one of the seven great sages, the Saptarshis, one of the many Prajapatis (the facilitators of Creation) created by Brahma (The God of Creation), the first compiler of predictive astrology.

This system is named after this great saint Bhrigu who is the first one to introduce predictive analysis. So, the vision of this entire system is to analyse the user behaviour and predict his/her next actions and deliver customised content. And all of this done in real-time.

Event Ingestion

Any user’s event data has to recorded in the system and to ingest the data into the system there are following 3 methods by means of which you can track the events.

  • Client side tracking using pixel
  • Server side tracking using gRPC
  • Mobile App tracking using POST method

Client Side Tracking (pixel.gif)

To track the events from different hosts and browsers you need to use the following Tracking URL along with their respective hosts.

https://www.carwale.com/bhrigu/pixel.gif?t=1499424326318&cat=CATEGORY_VALUE
&lbl=LABEL_VALUE&act=ACTION_VALUE&pi=PAGEURL&ref=REFERRER_VALUE

Note

Please find platforms and corresponding bhrigu tracking hosts.

Platform Tracking Host
CarWale https://bhrigu.carwale.com/ (or) https://www.carwale.com/bhrigu/
CarWale Staging/Testing https://bhrigustg.carwale.com/ (or) https://staging.carwale.com/bhrigu/
BikeWale https://bhrigu.bikewale.com/ (or) https://www.bikewale.com/bhrigu/
BikeWale Staging/Testing https://bhrigustg.bikewale.com/ (or) https://staging.bikewale.com/bhrigu/

Nomenclature

There are certain notations we have defined to track any generic event data.

  • t (mandatory for Client Side tracking)
    It denotes the current timestamp value so that browsers doesn’t cache the response and send every request to the server. Otherwise many events would be missed due to browser caching of pixel.gif image. This is not needed in case of Server Side tracking using gRPC.
  • cat (mandatory)
    It denotes the Category to broadly categorise the particular kind of events, ex: To track all the events on Quotation Page category value would be QuotationPage
  • lbl (mandatory)
    It denotes the label, a field where you can send any custom data as Key-Value pairs in the format like lbl=modelid=1003|versionid=4542|cityid=1. If there is no data to be sent then just send lbl=NA
  • act (optional)
    It denotes the action. So, for any event you want to track in bhrigu their combination of Category and Action should be unique
  • pi (optional)
    It denotes the current page URL of the event (Shouldn’t include query string in page URL)
  • qs (optional)
    It denotes the query string of the current page of the event. Format of qs is qs=a=1|b=2|c=3. & should be replaced by |
  • ref (optional)
    It denotes the referrer page URL (CarWale/BikeWale URL path) value with respect to the current event
  • src (optional)
    It denotes the source of the session. In ideal case, this should be sent in any event whose document referrer is a 3rd party URL

Warning

  • If you are tracking events from client side (ie., Web Browser) make sure _cwv is set for either .carwale.com or www.carwale.com domains (respective domain for bikewale)

Server side tracking (gRPC)

To track events from server you need to implement the rpc TrackEvent method which takes input message of TrackingRequest type. See proto file here.

Nomenclature

  • cookieid (mandatory for server side tracking)
    Unique identifier of the user
  • sessionid (mandatory)
    Unique identifier for that session of the user
  • category (mandatory)
    Identifier to categorise the events broadly
  • action (optional)
    It denotes the action. As mentioned earlier combination of Category and Action should be unique.
  • label (mandatory)
    In this you can send any custom data as Key-Value pairs in the format like modelid=1003|versionid=4542|cityid=1. If there is no data to be sent then just send NA
  • pageurl (optional)
    Current page URL of the event (Should not include query string of the page)
  • querystring (optional)
    Query string of the current page of the event
  • cookie
    optional if CookieId and SessionId is sent else _cwv cookie passsed from the end user should be forwared in this field
  • clientip (optional)
    IP address of the end user should be forwared in this field
  • useragent (optional)
    user agent of the end user should be forwared in this field
  • referrer (optional)
    Previous page URL of the current event

Mobile App tracking (POST method)

One request for each event from mobile would use a lot of network and battery. So, to reduce that we have implemented one POST method where you can send multiple events using single POST request. Now as metioned for Client and Server side tracking you need to send the CookieId and SessionId which is mandatory to identify a unique user and his/her session.

Note

Applications and corresponding bhrigu tracking URL’s.

Platform Tracking Host
CarWale App https://www.carwale.com/bhrigu/events/
CarWale App Testing https://staging.carwale.com/bhrigu/events/
BikeWale App https://www.bikewale.com/bhrigu/events/
BikeWale App Testing https://staging.bikewale.com/bhrigu/events/

Nomenclature

  • Cookie (Header) (mandatory)

    Request header should contain the unique identifier of the APP user in the cid followed by ;<space> as the seperator, similarly sid (SessionId) , pf (Platform) and appver (appversion) should be sent as key-value pair in this header.

    Example :

    cid=sajTzxiq5L32ridFAP5AHJMyY; sid=Ba3Xjx; pf=43; appver=4.3.2
    
  • cat (mandatory)

    Identifier to categorise the events broadly

  • act (optional)

    It denotes the action. As mentioned earlier combination of Category and Action should be unique.

  • lbl (mandatory)

    It denotes the label, a field where you can send any custom data as Key-Value pairs in the format like modelid=1003|versionid=4542|cityid=1. If there is no data to be sent then just send NA

  • pi (optional)

    It denotes the current screen name of the event

  • ref (optional)

    Previous screen name of the current event

  • ts (mandatory)

    There are two UnixTimestamp, root level UnixTimestamp denotes when the request was finally posted on the server. Whereas the other ts along with each event denotes when user has perfomed the event

{
        "ts": 1526553927021,
        "events": [{
                        "cat": "AppNotification",
                        "act": "Impression",
                        "lbl": "id=1234|title=Baleno Facelift|<Key>=<Value>",
                        "ts": 1526553790566
                },
                {
                        "cat": "AppNotification",
                        "act": "Click",
                        "lbl": "id=1234|title=Baleno Facelift|<Key>=<Value>",
                        "ts": 1526553801917
                },
                {
                        "cat": "CompareCars",
                        "act": "Comparison",
                        "lbl": "modelid=1003,1003|versionid=4545,4547|source=21|isorganic=0|<Key>=<Value>",
                        "pi": "<CurrentScreenName>",
                        "ref": "<PreviousScreenName>",
                        "ts": 1526553808717
                },
                {
                        "cat": "Images",
                        "act": "ImageView",
                        "lbl": "makename=Volkswagen|modelname=Polo|modelid=1003|timespent=7069|imageid=95634|<Key>=<Value>",
                        "pi": "<CurrentScreenName>",
                        "ref": "<PreviousScreenName>",
                        "ts": 1526553817277
                },
                {
                        "cat": "Images",
                        "act": "ImageView",
                        "lbl": "makename=Volkswagen|modelname=Polo|modelid=1003|timespent=7069|imageid=95635|imagetype=360|<Key>=<Value>",
                        "pi": "<CurrentScreenName>",
                        "ref": "<PreviousScreenName>",
                        "ts": 1526553821182
                },
                {
                        "cat": "Videos",
                        "act": "VideoView",
                        "lbl": "makename=Volkswagen|modelname=Polo|modelid=1003|versionid=7069|videoid=95635|<Key>=<Value>",
                        "pi": "<CurrentScreenName>",
                        "ref": "<PreviousScreenName>",
                        "ts": 1526553826797
                },
                {
                        "cat": "Reviews",
                        "act": "ExpertReview",
                        "lbl": "makename=Volkswagen|modelname=Polo|modelid=1003|versionid=7069|reviewid=25134|<Key>=<Value>",
                        "pi": "<CurrentScreenName>",
                        "ref": "<PreviousScreenName>",
                        "ts": 1526553841150
                },
                {
                        "cat": "News",
                        "act": "ExpertReview",
                        "lbl": "makename=Volkswagen|modelname=Polo|modelid=1003|title=Audi A4 launch|newid=3456|<Key>=<Value>",
                        "pi": "<CurrentScreenName>",
                        "ref": "<PreviousScreenName>",
                        "ts": 1526553849391
                }
        ]
}

Usage

This system is made to use as OLAP (Online analytical processing). So, you can use this system to track

  • Any user activity
  • Product changes impact
  • Performance related KPI
  • Log data, etc.

Morever, same data can be queried (BigQuery) and visualised in real time (Events Tracking).

Conventions

As this is a generic event logging system, you are recommended to follow some conventions

  • Key-Value pairs in lbl should be = (preferred) or : seperated. Moreover, multiple Key-Value pairs should be | seperated. Ex: lbl=modelid=1003|versionid=4542|cityid=1
  • Make sure you send both cid and sid in case of Mobile App or Server side tracking
  • Use Pascal case for Category, Action
  • In case of server side tracking, make sure to forward Header User-Agent as the information like Browser, OS, Device etc will be retrived from this
  • In case of server side tracking, send user IP in Header Client-IP otherwise for all these events server’s IP will be recorded
  • src value should be 3rd party URL of document referrer of the current page if user comes from outside or direct if user comes directly to website. Hence forward if document referrer is our own website you will need to send the ref parameter which is the referrer url of the current event
  • For referrer, just send relative url (Ex: /m/audi-cars)
  • In any event, either src or ref will be present but not both

Storage

Data is being sent to the API server but this data should be stored in a persistent storage. Also, the rate of event inflow is very high so relational databases doesn’t work in this case as write latency will be high. So, for better write latency we have used NoSQL database.

Components

For storage we have used open source technology Apache Cassandra as NoSQL database.

Schema

The tables stored in cassandra database have following schemas:

  • eventdata or stgeventdata :

    Data ingested from bhrigu.carwale.com or bhrigu.bikewale.com is stored in eventdata table and for staging stgeventdata.

    Column DataType Format Example
    partitionkey string YYYY-MM-DD-HH-X 2017-11-18-15-2
    category string SomeCategory QuotationPage
    cookieid string SomeUniqueID BMYXeGeNeL8J8txIovMVKd8iz
    sessionid string SomeSessoinID pB6IhoY5Aw
    logdatetime bigint unixtimestamp 1499281733000
    label string key1=val1|key2=val2 NewVersion:1934|OldVersion:1933
    action string SomeAction VersionChange
    browser string Browser Name Chrome
    browserversion string Browser Version 57.0.2987.132
    cookie string c1:v1; c2:v2 CWC:155DyWrzWGcglLiTEm4c2ZaPp; _cwv:155DyWrzWGcglLiTEm4c2ZaPp.6bgghlJKP .1499452517.1499452517.1499452517.1
    ip string xxx.xxx.xxx.xxx 169.149.134.21
    isbot boolean true/false false
    ismobile boolean true/false true
    os string OS Name Android 6.0.1
    pageinfo string Page URL https://www.carwale.com/m/tata-cars/
    platform string Device Info Linux
    query string key1=val1|key2=val2 makeId=45|modelId=560|versionId=5268
    referrer string Referrer URL /m/new
    rendorengine string Rendor Engine Name AppleWebKit
    rendorengineversion string Rendor Engine Version 537.36
    source string Source direct
    useragent string User Agent Raw String Mozilla/5.0 (X11; Linux x86_64)
  • dailyuserprofilescw or dailyuserprofilesbw :

    This table has day-wise data of userprofiles generated from event data for CW & BW

    Column DataType Format Example
    logdate string YYYY-MM-DD 2017-06-25
    cookieid string SomeUniqueID BMYXeGeNeL8J8txIovMVKd8iz
    userdata bigmap {Key1:{k1:v1,k2:v2}, Key2:{k3:v3,k4,v4}} {‘models’:{‘amaze’:3,’nano’:2},’bodystyle’: {‘sedan’:3,’hatchback’:2}}
  • agguserprofilescw or agguserprofilesbw :

    This table has aggregated data of userprofiles over a period of 90 days generated from event data for CW & BW

    Column DataType Format Example
    cookieid string SomeUniqueID BMYXeGeNeL8J8txIovMVKd8iz
    userdata map<text,map <text,int>> {Key1:{k1:v1,k2:v2}, Key2:{k3:v3,k4,v4}} {‘models’:{‘amaze’:3,’nano’:2},’bodystyle’: {‘sedan’:3,’hatchback’:2}}
    userpreference map<text,text> {Key1:Val1,Key2:Val2 Key3:Val3} {‘models’:’amaze,nano’,’bodystyle’: {‘sedan,hatchback’,’carpreference’:’new’}

Processing

Storing data in NoSQL databases is easy but what is more challening is to process it.

Components

We have used Apache Spark for processing the data.

Apache Spark™ is a fast and general engine for large-scale data processing. Spark Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk. Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python and R shells.

Pipelines

We have developed a system for querying the big data using SQL query. We’ll discuss more in detail further, see here. There are two pipelines for data processing.

  • Batch Processing
    Generally for creating reports and batch analysis
  • Real-Time Processing
    Required for real-time calculation of metrics

Event Tracking

We have developed a panel on CW-OPR for monitoring events that come to us. This will help POs, TLs and others to:

  • View Events
  • Add new Events
  • Update Event details

Event tracking system is created to regulate events being sent to Bhrigu. There are many unwanted events that are sent to bhrigu which can be moved to other sources like GA, DBLogging etc. Now, events being sent to bhrigu will only be accepted if they are approved. Otherwise, they will be discarded. The OPR panel creates a system which enables users to add/update events and send them for approval.

The complete logical flow of Event Tracking is shown below.

Event Tracking Logical flow

Event Tracking Logical Flow

Some information regarding fields required for an event is shown below:

Field Name Type Optional Unique
Category String No Yes
Action String Yes Yes
Label Keys List No

For a detailed explanation on which event should be sent to Bhrigu,GA,DBLogging etc, please read the guidelines here.

System Architecture

Event Tracking Architecture

Event Tracking Architecture

To better explain the architecture:
  • Events from different sources are received by Bhrigu Service.
  • Bhrigu Service sends these events to Kafka.
  • Kafka is basically a streaming platform which can easily handle huge amounts of data in real-time.
  • Bhrigu Consumer consumes this data from Kafka, keeps approved events and discards unapproved events.
  • These events are then stored in Cassandra.
  • CW OPR makes use of these events from Cassandra and helps user Add, Update, etc events on the go.

Any event addition/updation goes through an approval process as shown below.

Event Approval Process

Event Approval Process

There are 3 roles given on the OPR side. Each role has been given certain permissions:

Role Name Can View? Can Add/Update? Can Approve/Reject?
event-tracking-viewer Yes No No
event-tracking-manager Yes Yes No
event-tracking-authorizer Yes Yes Yes

Adding an Event

Before adding an event, make sure that you read the guidelines.

Event Creation Demo

Event Creation Demo

There are 3 things required for adding an event:

Field Name Type Optional Unique
Category String No Yes
Action String Yes Yes
Label Keys List No

Updating an Event

You can only update the label keys of a particular event.

Event Updation demo

Event Updation Demo