It is true that data science jobs mushroomed. But at the same time, landing a decent position in this field remains notoriously challenging especially for novices. This is because of the subtle difference between data science in theory and real-life data science that is correlated with the problems businesses deal with on a day-to-day basis.
In academia, there is a great emphasis on Python with regard to data science. Professors and instructors teach how to leverage Python libraries such as NumPy, Pandas, and Scikit-learn to make sense of data.
source : https://rmit.instructure.com/eportfolios/43125/Home/Regarder_Maya_the_Bee_3_The_Golden_Orb_Streaming_VFMaya_the_Bee_3_The_Golden_Orb_Film_Streaming_VF_en_VosTFR_Film_complet_HD
While Python alone is sufficient to apply data science in some cases, unfortunately, in the corporate world, it is just a piece of the puzzle for businesses to process their large volume of data.
To trace the cause behind Python’s inability to cover all data science stages from data extraction to model evaluation, it is of paramount importance to know where businesses store their data in the first place.
For most companies data is stored in databases on servers. These databases need to be managed concurrently to ensure efficiency and data availability.
Unfortunately, this task exceeds Python capabilities, and here when SQL (Structured Query Language) comes to play. That is why, understandably, SQL is present in almost all data science-related job posts. For example, roles as a data analyst, business analyst, and data scientist.
In addition, hiring managers test candidates’ SQL proficiency before getting down to the nitty-gritty of data science such as machine learning and deep learning.
The reason is that without SQL, one cannot even get the required data to process. Thus, from a recruiter’s perspective, experience in SQL outweighs that of Python.
SQL again is even more popular among professionals than Python. The results of the latest 2020 StackOverflow survey conducted with the participation of 47,184 professional developer on the most important coding languages was concluded with the following:
source : https://vocus.cc/article/6004c2c8fd8978000158dc33
Research on the courses that are offered on the internet that teaches SQL produced the following comments:
First, online courses on SQL are scarce compared to those on Python. For example, Edx produces 31 courses when typing the word “SQL” in the search bar unlike “Python” which yields 94.
Second, there is a gap in SQL courses’ levels. Simply put, either the course is too introductory or advanced. The former scratches on the surface of SQL such as (SELECT, INSERT, and UPDATE queries). Whereas, the latter might appear cryptic and hopeless to follow.
Although SQL is well regarded among professional developers, it is strangely underrated in most learning platforms on the internet. This impacts negatively self-taught individuals and hinders their progress towards becoming data scientists.
One of the cases, when SQL industry experience comes to play, is known as “race condition”.
Race Condition is a recurring problem in Relational Database Management Systems (RDBMS). It happens when millions or billions of operations are performed on a database at the same time.
An example of that will be when a post on social media goes viral and millions of people interact with it simultaneously resulting in intertwined operations on the database, which leads to constrains violation and many more unwanted results.
Without getting too much into the details, one possible solution among others to handle these situations is to use “locks”. Of course, each solution has its own use cases and limitations.
Knowing what solution to use and when to use is a skill that could be honed only through professional experience.
Python remains an interesting coding language to learn especially for aspiring data scientists. Its importance in data science should not be overlooked or underestimated. But SQL remains the dark horse for the edge, which it gives one over other candidates when the competition for a position is fierce.
Learning SQL is not a walk in the park. The query language does not only require industry settings — server if working locally — but working with SQL involves using more than one database software. MySQL, SQL Server, and PostgreSQL to name a few.
There are slight differences in the syntax of SQL software. This can be annoying since not all businesses use the same software. Hence one should learn at least the most used ones such as MySQL and SQL Server.