Around 75% of pageviews lies between 1–4. The log of “transactionRevenue” follows a normal distribution against “pageviews” but there is no direct correlation between them.Around 75% users spend less than 4 mins(244 secs) on a session.The log of “transactionRevenue” also follows a normal distribution against “timeOnSite”.5.4 Channel Grouping analysisTransactions and revenue … [Read more...] about Google Analytics Customer Revenue Prediction
Machine Learning
Save Time on ML Data Collection by Being a Musician
I used CreateML for this, the software is great and easy to use, but data collection can take forever. This method of data collection will apply to any use of time series motion data collection, so feel free to read if you aren’t using the same software.When I first started the data collection process my plan was to use os_log to output the data from my Apple Watch, process … [Read more...] about Save Time on ML Data Collection by Being a Musician
CatBoost regression in 6 minutes
The objective of this tutorial is to provide a hands-on experience to CatBoost regression in Python. In this simple exercise, we will use the Boston Housing dataset to predict Boston house prices. But the applied logic on this data is also applicable to more complex datasets.So let’s get started.First, we need to import the required libraries along with the dataset:import … [Read more...] about CatBoost regression in 6 minutes
Use Case #1: A Glimpse Into The World of Retail
Plan of action:Evaluation metricsBaselinesModelsPerformanceExampleBefore we jump in, let’s first establish a few ground rules!★ Prerequisites:An item is the combination of a category and a subcategory.Returns are removed from the transactions and only the purchase record is kept. A customer may return a product that doesn’t suit his needs, but he is probably still interested in … [Read more...] about Use Case #1: A Glimpse Into The World of Retail
Top 10 Deep Learning Concepts You Should Know
Most Common Deep Learning QuestionsSourceIf you are a junior data scientist like me, probably you find deep learning to be an extremely vast topic. However, there are some key concepts that are very common in entry position interviews and that you should have very clear.In this article I will give you a quick guide to some of the most important concepts in deep learning, you … [Read more...] about Top 10 Deep Learning Concepts You Should Know
Optimizing Cash Recycler Machine Location
Machine learningK-Means ClusteringBackgroundCited from ARCA website “A cash recycler is a complex machine that handles a couple of simple, but important tasks — accepting and dispensing cash. It also stores money securely, keeps an accurate accounting of cash on hand, and automates the cash cycle. Generally, you’ll find them in banks, credit unions and back-office retail cash … [Read more...] about Optimizing Cash Recycler Machine Location
Best practices for GANs
Generative Adversarial Networks are known to require good hardware and time to train. Because of this, certain optimizations and standards have come to be so as to reduce the time for training in GANs. In this blog I will list a few of these tips that help create better GANs faster. Most of these come from books and videos such as Jason Brownlee’s General Adversarial Networks … [Read more...] about Best practices for GANs
Security & Privacy Consideration for Artificial Intelligence / Machine Learning Based Solutions
With AI / ML having a boom, aptly supported by the robust Cloud ecosystem which has sprung up in the last 18~24 months many organizations are in the early stages of adopting and integrating ML infrastructure with their solutions. Organizations are seeing immense value in traversing this path as they see this as a future in the post pandemic world. Having said that , It is … [Read more...] about Security & Privacy Consideration for Artificial Intelligence / Machine Learning Based Solutions
论文阅读笔记:理解Informer模型结构(AAAI 2021 best paper)
论文题目:Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting论文链接:https://arxiv.org/pdf/2012.07436.pdf论文目的:解决长序列的时间序列预测问题中,因为距离太长,dependency关系没有被很好的捕捉导致预测效果不好的问题。最近的研究显示transformer在这方面做的比之前的work比如rnn好,但是transformer有一些自身的问题:比如时间复杂度高,内存使用高,encoder-decoder结构自身的局限性。为了解决这些问题,这篇paper提出了一种基于transformer的模型叫informer。informer有三个特性:(1)提出的ProbSparse … [Read more...] about 论文阅读笔记:理解Informer模型结构(AAAI 2021 best paper)
What is Data Integration
The four components of Data Preprocessing are:Data IntegrationData TransformationData ReductionData CleaningData IntegrationData integration is the combination of technical and business processes used to combine data from disparate sources into meaningful and valuable information. A complete data integration solution delivers trusted data from various sources to support a … [Read more...] about What is Data Integration