• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Home
  • Crypto Currency
  • Technology
  • Contact
NEO Share

NEO Share

Sharing The Latest Tech News

  • Home
  • Artificial Intelligence
  • Machine Learning
  • Computers
  • Mobile
  • Crypto Currency

Google Analytics Customer Revenue Prediction

February 18, 2021 by systems

  • Around 75% of pageviews lies between 1–4. The log of “transactionRevenue” follows a normal distribution against “pageviews” but there is no direct correlation between them.
  • Around 75% users spend less than 4 mins(244 secs) on a session.
    The log of “transactionRevenue” also follows a normal distribution against “timeOnSite”.

5.4 Channel Grouping analysis

Transactions and revenue across channels
  • Most transactions come from Referral and Organic search; they also have high revenue generation.
  • No of transactions from Direct sources are low but their revenue generation is high, on par with Referral and Organic search.

5.5 Web browser analysis

Transactions and revenue across web browsers
  • No of transactions and revenue generated is highest from Chrome.
  • Firefox and Safari users have very low transactions compared to Chrome but their revenue generation is close to that of Chrome.
  • Marketing teams can focus on chrome users to maximise the revenue generation.

5.6 City Analysis

Transactions and revenue across cities
  • Lot of city data is missing in the dataset (58%).
  • New York, Mountain View and San Francisco are 3 most high revenue generating cities with most number of transactions.

Multivariate analysis

5.7 Grouping OS and browsers to see their impact on transactionRevenue

Transactions and revenue across Browser-OS
  • Both Windows and Mac users have higher transactions and total revenue generation using the Chrome browser.
  • Across all Operating systems chrome users have higher number of transactions.
  • This supports earlier conclusion that chrome users generates more revenue compared to other browser users.

6. Feature engineering

6.1 Data imputation

We will impute “null” values with 0 for all numerical varaibles. Following code snippet shows an example of imputation for the target variable “transactionRevenue”:-

%%time
print(‘Count of nan values:-’)
print(f”Before Imputation:{train_df[‘totals.transactionRevenue’].isnull().sum()}”)
# we will impute ‘nan’ with 0
train_df[‘totals.transactionRevenue’].fillna(0, inplace=True)
test_df[‘totals.transactionRevenue’].fillna(0, inplace=True)
print(f”After Imputation:{train_df[‘totals.transactionRevenue’].isnull().sum()}”)

6.2 Delete non-useful features

We will delete features having more than 85% of missing data and which may not have any useful data for predicting the target variable.

# list of columns to drop due to over 90% missing data
cols_to_drop = ['trafficSource.adContent', 'trafficSource.adwordsClickInfo.adNetworkType',
'trafficSource.adwordsClickInfo.slot', 'trafficSource.adwordsClickInfo.page',
'trafficSource.adwordsClickInfo.gclId', 'hits', 'totals.totalTransactionRevenue']
train_df.drop(cols_to_drop, axis=1, inplace=True)# deleting all columns in test dataframe that are not present in train
# list of columns in test_df that are not in train_df
tl = [col for col in test_df.columns if col not in train_df.columns]
# dropping those columns
test_df.drop(tl,axis=1,inplace=True)

6.3 Standarising Numeric features

We will standarise the numeric features using the MinMaxScaler() from scikit-learn.

6.4 Label encoding Categorical features

We will encode categorical features using LabelEncoder() from scikit-learn.

6.5 Time window features

Main Idea:-
https://www.kaggle.com/c/ga-customer-revenue-prediction/discussion/81542

  • Inspired from the above discussion thread. The author tells that the problem is essentially a time window to time window prediction.
  • He created time windows using 15 days of overlapping windows. But instead we can try creating windows of size 168 days as the test data given by kaggle has 168 days of session.
  • We need to make sure that target variable for each window has a gap period of 45 days.
  • And the target period should be of 2 months same as that of the private leaderboard of kaggle.
  • Another key idea is not to do hyperparameter tuning.
  • Kaggle data:-
    TEST DATA : transactions from May 1st 2018 to October 15th 2018 (168 DAYS)
    KAGGLE PRIVATE DATA : Dec 1st 2018 to jan 31 st 2019 (2 MONTHS)
    GAP PERIOD : Time interval between Test data and Private data (45 DAYS)

7. Data splitting: Train, validation and test sets

  • We will use first 8 windows for train, last 2 windows for validation(we will not be doing parameter tuning but this validation set will be used to compare models therefore it can also be thought as test data).
Train set
validation set
  • We will use the test data provided by kaggle for submission and getting the private leaderboard score.
Test set for kaggle submission

8. Machine learning models

As discussed in feature engineering and suggested here, we will not be doing any hyperparameter tuning for our models.

8.1 LightGBM (Light Gradient Boosting Machine)

  • This model gave the best kaggle private score of 0.884 among all different models i tried.
Kaggle private score
  • Following code snippet shows how to create the submission file.
  • Feature importance according to the LightGBM model.
Feature importance using LightGBM model

8.2 Random Forest model

  • Random Forest gave a score of 0.9373 on private leaderboard.
Feature importance using Random Forest model

9. Results

Following table contains the summary of the results.

Final results

10. Future work

In the future work we can try following things:-

  • Try ensembling of different models.
  • Try various sizes of time windows, in our case we didn’t use overlapping windows that can also be tried.

11. References

12. Profile

Filed Under: Machine Learning

Primary Sidebar

Stay Ahead: The Latest Tech News and Innovations

Cryptocurrency Market Updates: What’s Happening Now

Emerging Trends in Artificial Intelligence: What to Watch For

Top Cloud Computing Services to Secure Your Data

The Future of Mobile Technology: Recent Advancements and Predictions

Footer

  • Privacy Policy
  • Terms and Conditions

Copyright © 2025 NEO Share

Terms and Conditions - Privacy Policy