Yes, you read that right. 99.9% of articles I come across instruct you to start with programming first. Bad move!
Data Science is a math-laden field, and in order to understand the many constructs in the field, you’ll need to have some kind of familiarity with math. Now, you don’t need a math degree for this, but you can use some of the following resources to (at least) read up on.
3 topics that are absolutely essential are Linear Algebra, Calculus, and Statistics. For the most part, you can get away with just Statistics, but either way, it’s good to know about the concepts that drive the field:
This branch of math is used (almost) everywhere in Data Science. Your computer uses a lot of Linear Algebra in a majority of its calculations. The processing and representation of deep neural networks use Linear Algebra. Quite frankly, you’re missing out on a lot if you don’t have at least a basic understanding of the concept.
Like Linear Algebra, Calculus too plays a large role in Data Science. But you don’t need to be a guru. All you need is a basic understanding of the core principles that affect your models.
Statistics & Probability
This topic will probably take up a significant chunk of your time. Good news: these concepts aren’t difficult, so there’s no reason why you shouldn’t master it.
There are other topics that are useful, such as Graph Theory and Discrete Mathematics. You won’t be using them daily as a beginner, but expect to encounter them as you progress up the experience chain.
Still, if you’d like to have a quick read through:
If you are terrified at the mere mention of “math”, you’re probably not going to have much fun as a Data Scientist. However, if you’re willing to invest time to improve your familiarity with the principles underlying calculus, linear algebra, stats, and probability, nothing — not even math — should get in the way of you becoming a Data Scientist.
PS: Math really is fun. As you go deeper into math, be sure to understand the beauty of a certain math concept and how it affects something. You’ll soon share the unbridled passion that many mathematicians and Data Scientists share!
Now to the more exciting part: programming. With more than 2.5 exabytes of data being generated every day, it would be absurd not to use computers to analyze/find meaningful representations from that data.
“How much programming is required in data science, particularly statistical analysis and machine learning?”
A lot. In practice, most data science jobs will require you to code, and also because most companies require some data cleaning, implementation and productization, and adaptation of algorithms to their own specific purposes. If you can’t implement your own solutions into something product-ready, then you are a much less useful employee. (Source)
Python & R
Python is, by far, the most widely used programming language when it comes to Data Scientist. Almost four out of five developers say that Python is their main language in JetBrains’ 2016 survey.
While Python may suffice for a majority of your tasks, you’ll need to have R on your toolkit as well to consider yourself a “well-rounded” Data Scientist. I recommend you focus on Python, and spend a little time on R as well.
Computer Vision today is at the forefront of many exciting developments in the fields of autonomous vehicles, medical image analysis and so much more. This field is responsible with deriving useful insights into, primarily, image data (although videos are used to some extent).
While many Computer Vision libraries like OpenCV and Torchvision exist today, I highly recommend you use Caer, my Computer Vision library — it’s fast, lightweight, and extremely beginner-friendly.
Machine Learning & Deep Learning
Today, Machine Learning and Deep Learning algorithms are at the core of Data Science. For most job openings, this is where the demand lies.