That’s enough talk. Let’s get to the interview!
Some of the tools that he used as an analyst are
- Google Sheets, for collaboration with stakeholders
- Tableau for data visualization
He also emphasized the importance of SQL as a tool for data extraction at scale. In fact, in his opinion, it is more important to learn SQL than python.
He also went into a discussion comparing R and Python. While R is friendlier to non-technical folks than Python, it is less easy to productionize and integrate with existing systems than Python.
Overall, he also saliently pointed out that it doesn’t matter which tool you use exactly, as he commented that
“I wouldn’t say you have to learn python or R, because tools always change. But being able to execute data processing procedures in a scalable way is necessary in the current day and age of data economy.”
Thinking about problems in probability.
As Cliff was looking to jump into the data science field from economics, he was evaluating his chances using probability by looking at market trends and his existing skill sets.
It is evident that to be interested in data analyst, one most probably will need to know how to code. So he picked up programming.
He first built up his basic programming language in python, starting from basic concepts like numbers and strings manipulation.
These building blocks then helped him at building projects to consolidate his learning. Combining his interest in basketball and his knowledge web scraping, he cleaned the data from the official NBA website and ran through models like logistic regression, SVM and random forest. Eventually, he was able to build a pipeline to scrape data and make daily predictions. The end-to-end flow took 3-years.
Speaking about a lesson that he learnt from this project, he lamented the dirtiness of the data from an official source (NBA), and mentioned —
Don’t trust your data source and always check your data.
Whether a role is sexy or otherwise is dependent on a person’s personality, Cliff quipped. Regardless, he shared some highlights from his roles.
Interacting with stakeholders
As part of the digital marketing team in Carousell, he interacted with multiple stakeholders — technical or otherwise. As an analyst, he’s the connector between technical and non-technical people to create the pipeline from engineers to the end stakeholders.
He also spent time educating non-technical stakeholders on the definitions of technical terms like “real-time” or “AB testing” and conveyed the limitations of existing data. The key of such communication also lies in conveying technical information in a non-technical manner, or as Cliff put it,
“you have to explain to them in a way that makes them understand instead of simply throw them technical terms,”
Dealing with Uncertainty
Uncertainty in Cliff’s role comes in different forms. One form is the uncertainty over the statistical tests.
“I get asked: why are all your AB tests failing? [The answer is] because they are tests; we are not proposing the truth, but simply hypotheses.”
Another form is uncertainty over the cleanliness of the data. The use of multiple sources of data, say Facebook, Google and also internal data, gives rise to data discrepancy. Thus, it is important to know how to explain to the stakeholders the limitation of the data in the event of such discrepancy.
In the age where companies want to move fast and break things, it is understandable that documentation is swept under the rug. However, documentation is important because of at least two reasons.
Firstly, documentation can remind yourself of you have done. Cliff organizes his work in folders and pulls them up when he needs a reminder.
Secondly, it allows others to build on your work quickly.
The ideal candidate would be
- Humble and coachable, since there is a lot to learn and unlearn.
- Intellectual curiosity to understand concepts from statistics and programming.
- Good with people, technical or otherwise.
- Able to work with uncertainty and incomplete information
Learn SQL (Structured Query Language)
Cliff mentioned that SQL should be the most important thing that people should learn, as it allows you to do data extraction in an optimized manner.
Learn Python and/or R
In deciding whether to learn python or R, Cliff encouraged listeners to do their research into the tool of preference in their respective industries.
For instance, the finance industry might require analysts to have some proficiency in VBA, and some other industry might like analysts who know SPSS.
Know your comparative advantage
It is important to know what your advantage over the other candidates are and know how to position yourself. For instance, someone from finance has a higher probability of breaking into data in the fintech space than others because of your prior knowledge.
When Cliff was looking at a role, he positioned himself as having a mix of domain and technical knowledge —
“I know more domain knowledge than a computer scientist and more statistics than a business person.”
Value opportunities over big names
When looking for a role, one should consider perusing the job description and seek out for growth opportunities rather than chasing big names. This is especially true for fresh graduates.