3 Things I Did to Become a Data Scientist

Introduction
Master Popular Machine Learning Algorithms
Perform An End-to-End Case Study
Use Data Analytics to Master Data Processing
Summary
References

This article is for people who are currently Data Analysts and want to make the career change to Data Science. From the outside eye, some would say the two roles are similar, but as we know, they require vastly different skills apart from some of their similarities. So, how can you bridge the gap between these roles? That is what I will be explaining here. While there is the material that commonly recommends graduate school, online courses, and tutorials, I wanted to focus on the more specific and unique things you can do to transition to Data Science, from someone who was a Data Analyst first and is now a professional Data Scientist. Keep on reading if you would like to know three things that you can do to become a Data Scientist.

Photo by Markus Spiske on Unsplash [2].

While this topic may seem obvious, I actually found myself first off studying primarily Logistic Regression, Regression, and Decision Trees. The reason I say ‘popular’, is because every year or so, there is a new algorithm that course creators will not have developed material for you to learn in school yet, so it is up to you to go out of your way to learn the newest, latest, and best algorithm and library. There are two sides to this topic, one is mastering the code, and one is mastering the theory. Oftentimes, after some education, you dive deeper into the code more and more and strive away from what actually makes the algorithms work. Therefore, it is important to know how to explain nearly the top 10–20 algorithms out there in a descriptive way, and not just in a programmatic way. As you move on to becoming a Data Scientist, you will realize most of the libraries for the algorithms work the same way, and the actual code for them is fairly easy, you start trial-and-error-ing, and then realize you have forgotten some of the theory behind what makes an algorithm different from another one and how it actually works on a conceptual level.

That being said, here are some ways that you can master these algorithms.

As an action item, I personally think writing out on notecards the algorithm name on the front side, and then the description of how it works and how you will explain it to yourself and others moving forward.
Similarly, as a next action item, you can even draw out how the algorithm works, like a Decision Tree for example. When you physically write and draw something out, there is something about that method that makes you remember that material more.
On the programmatic side, one library that is especially useful at comparing nearly all Machine Learning algorithms is PyCaret, by Moez Ali.

So, after you learn the basic algorithms, you will realize that once you are a professional Data Scientist that there was plenty more to practice. Therefore, it is best to study them now as a Data Analyst. To dive deeper, a lot of courses, tutorials, and educational material will start off with the same few basic algorithms, but oftentimes, a statistics book or beginners’ Machine Learning course does not expound on newer and more popular algorithms like XGBoost and CatBoost.

Photo by Campaign Creators on Unsplash [3].

For this experience, the goal is to not only perform an end-to-end process to resemble how a normal process would work once you land the job, but also to share your case study so that hiring managers, recruiters, and future coworkers can see some of what you are capable of. As a Data Analyst, you may already be familiar with the process of defining a business problem and investigating the data surrounding it. Meaning, you will have a leg up on others who are not as used to this process.

What you will want to do is the following:

find a common problem, like predicting the stock market
obtain some free mock data
identify key features that you think would be important to include in the model
test around 10 algorithms on the same data and compare how each one performs
summarize your results with a visualization, which is what you will do in the professional setting

This process can be showcased in a variety of ways. You can either show it all in your Jupyter Notebook, or a similar tool, and save your plots and discussions within the markdown of the notebook, or create separate summary visuals within Tableau, Excel, or Google Data Studio. The most common way of presenting your case study is to post it on GitHub, most engineers, scientists, and managers are used to this format and tool, so sharing here is preferred. As a Data Analyst, you may have a leg up with organizing data, identifying business metrics or KPI’s (Key Performance Indicators), and visualizing results as well.

Photo by Myriam Jessier on Unsplash [4].

Perhaps the biggest pain point of Data Science is the preprocessing or processing of data. This step often takes the longest as well. As a Data Analyst, you can leverage your data skills to ensure your dataset that will be used for your model is in its best form. Knowing which algorithms to use as we discussed above can save you a lot of time, because for example, sometimes missing data can be a hassle, while some algorithms automatically handle it.

Here are some of the ways that you can leverage Data Analytics skills in your Data Science preprocessing step:

imputing missing data in a variety of ways, like mean, min, or max
merging CSV files together to create your final dataset
utilizing SQL to query your company’s tables, including groupings, cases, and filters
reassigning data types to certain features (‘object’, int, float, ‘category’, etc.)

As you can see, being a Data Analyst now, will allow you certain advantages when pursuing Data Science. It is often said that a majority of the difficult and stressful parts of Data Science is actually data processing, so if you can master that first, or at least be familiar with it, you may have the edge in becoming a Data Scientist.

If you are wanting to become a Data Scientist, make sure you know what you are getting yourself into. While building algorithms is the more interesting part — usually, it is important to keep in mind that data analysis is also a huge part of the process. Another thing I did to land a job as a Data Scientist, was studying the theory of Machine Learning algorithms — more than just the usual ones that are exampled in older textbooks and courses. You may also have the edge when you explore the newest algorithms because most likely, they are improving on what the previous ones did poorly, like speed, accuracy, data types, missing values, and so on. Lastly, having a portfolio of a case study or two can be incredibly advantageous to not only yourself, but also more appealing to more hiring managers and recruiters.

To summarize, here are the three things I did to become a Data Scientist, and hopefully you can apply these actions as well:

* Master Popular Machine Learning Algorithms* Perform An End-to-End Case Study* Use Data Analytics to Master Data Processing

I hope you found my article both interesting and useful! Please feel free to comment down below if you have leveraged any of your Data Analytics skills when becoming a Data Scientist — and which ones ?. Has it helped you in your Data Science career now? Do you agree or disagree, and why?

Please feel free to check out my profile and other articles, as well as reach out to me on LinkedIn. I have no affiliations.

Thank you for reading!

[1] Photo by Nathan Dumlao on Unsplash, (2018)

[2] Photo by Markus Spiske on Unsplash, (2016)

[3] Photo by Campaign Creators on Unsplash, (2018)

[4] Photo by Myriam Jessier on Unsplash, (2020)

Footer