• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Home
  • Crypto Currency
  • Technology
  • Contact
NEO Share

NEO Share

Sharing The Latest Tech News

  • Home
  • Artificial Intelligence
  • Machine Learning
  • Computers
  • Mobile
  • Crypto Currency

My Best Method to Learn New Tools in Data Science

February 10, 2021 by systems

And how I practice it

Soner Yıldırım

Photo by Element5 Digital on Unsplash

Data science has experienced a tremendous growth in recent years. The advancements in data collection, storing, and processing have contributed to this growth.

The potential to create value using data attracted many industries. More and more businesses have adapted data-centric strategies and processes in their operations.

The ever growing demand has also motivated developers and open-source community to create new tools for data science. Thus, the people who work in the field of data science has many libraries, frameworks, or tools to do their work.

Some of these tools are designed to perform same tasks just in a different programming language. Some are more efficient than others. Some focus on a particular task. The undeniable truth is we have many tools to use.

You may argue that it is better to stick one tool for a particular task. I, however, prefer to have at least a couple of options. I also would like to be able to do a comparison between tools.

In this article, I will try to explain how I learn new tools. My strategy is based on comparison. I focus on how a given task can be accomplished with different tools.

I clearly see the differences as well as the similarities between them. Furthermore, it helps to build an intuition about how the creators of such tools approach particular problems.

Let’s say I’m comfortable with Pandas library in Python and want to learn dplyr library in R. I try to perform the same tasks with both libraries.

Consider we have the following dataset about a marketing campaign.

marketing (image by author)

I would like to create a new column that contains the ratio of the spent amount and the salary. Here is how it can be done using both Pandas and dplyr.

#pandas
subset['spent_ratio'] = subset['AmountSpent'] / subset['Salary']
#dplyr
mutate(subset, spent_ratio = AmountSpent / Salary)

Let’s do another example that compares Pandas and SQL. Consider we have a dataset that contains groceries and their prices.

(image by author)

We want to calculate the average item price for each store. This task can be accomplished with both Pandas and SQL as follows.

#Pandas
items[['store_id','price']].groupby('store_id').mean()

price
store_id
-------------------
1 1.833333
2 3.820000
3 3.650000

#SQL
mysql> select store_id, avg(price)
-> from items
-> group by store_id;
+----------+------------+
| store_id | avg(price) |
+----------+------------+
| 1 | 1.833333 |
| 2 | 3.820000 |
| 3 | 3.650000 |
+----------+------------+

Filed Under: Artificial Intelligence

Primary Sidebar

Stay Ahead: The Latest Tech News and Innovations

Cryptocurrency Market Updates: What’s Happening Now

Emerging Trends in Artificial Intelligence: What to Watch For

Top Cloud Computing Services to Secure Your Data

The Future of Mobile Technology: Recent Advancements and Predictions

Footer

  • Privacy Policy
  • Terms and Conditions

Copyright © 2025 NEO Share

Terms and Conditions - Privacy Policy