Touted as one of the perfect platform used by the largest data science community, Kaggle offers a unique peek into the data science industry.
The survey by Kaggle covers significant areas in data science, such as:
· Programming languages
· Machine learning algorithms
· Diversity, salary, and education
The latest survey, 2020, included nearly 24,000 users from across the globe, providing information regarding their opinions, behavior, and demographics.
Every year Kaggle conducts a survey to explore topics and trends that matters the most to specific groups.
Since we’re already at the beginning of another year, we will specifically discuss the survey from previous years — 2017, 2018, 2019, and 2020.
You would be amazed to see the outcome of the survey.
With new techniques and algorithms accelerating at a breakneck speed, the survey showcases whether new techniques will continue to replace the old techniques or perhaps be a part of the existing technique.
· Ever since the surveys started taking place, the number of respondents has been at an all-time high. The survey ensured to maintain participants between 17,000 to 24,000 every year.
· Out of which, nearly 2,400 to 4,100 respondents identified had “data scientist” as their job title.
Besides data scientists, we could also see other job titles who responded to the survey such as “data analysts” and “business analysts” — however, both titles have been amalgamated into one category. For various reasons, job titles such as machine learning engineer have appeared only in the 2017 and 2020 survey. Therefore, you won’t be seeing it in the other years (2018 and 2019).
The survey helps users analyze trends and technologies in the data science and big data analytics industry. More so, such surveys can help aspiring data science professionals, machine learning engineers or big data analysts better understand the trends.
Let us further delve deeper and briefly talk about the significant areas covered in these surveys.
Most often, a big data analyst stays confused in deciding which programming language to learn to get into data science.
Going by the survey, most of the data scientists preferred using Python.
As a result of the survey, more than 78 percent of the data scientist, machine learning engineers, and software engineers reported they were comfortable with Python.
While even for business and data analysts, the usage of Python consistently grew from 61 percent to 87 percent.
Another programming language that was most preferred by data scientists was R. However, the percentage of data scientists using R dropped by more than 33 percentage points ranging from 64 percent to 23 percent.
Overall, Python consistently continued to be the most preferred programming language by data scientists, machine learning engineers, business analysts, and data analysts over the past four years.
Data science techniques involving data analysis and predictions are the heart of data science.
Within these four years, questions were raised about the type of general techniques used and the time-line regarding data science workflows. However, one specific question appeared in three surveys out of the four — “what are the types of machine learning algorithms used”?
Tech professionals looking to make their way into the data science industry need to know the type of machine learning algorithms used in data science techniques.
The most common algorithms include:
· Linear Regression
· Logistic Regression
· Decision Trees
· Gradient Boosting
· Neural Networks
Another category of machine learning algorithms missing in the image includes supervised and unsupervised machine learning — clustering and dimensionality reduction.
As a result, most surveyed data science professionals were males.
Although, there has been a significant improvement over the past years in the percentage of non-male professionals in the data science realm.
However, the male counterpart in the data science field was still high (over 80 percent).
Over the years, even the salary compensation for data scientists increased except for job titles such as software engineers.
While according to the demographics, candidates with neither a Ph.D. nor a Master’s degree showed slight growth from 27 percent to 32 percent.
This may be due to the constant proliferation of online MOOCs, online education programs, and online data science certification programs.
Current organizations are now more focused on candidates having practical skills and not just theoretical knowledge. These are now easily achievable by obtaining certification programs that offer projects and real-world problems to solve.
Perhaps, you’ll need to wait one more year to check the latest data science trends that took place in 2021. We hope Kaggle would continue with the survey in the coming years.