
1. Convert_dtypes
For an efficient data analysis process, it is essential to use the most appropriate data types for variables.
It is mandatory to have a specific data type in order to use some functions. For instance, we cannot do any mathematical operations on a variable with object data type. In some cases, string data type is preferred over object data type to enhance certain operations.
Pandas offers many options to handle data type conversions. The convert_dtypes function converts columns to the best possible data type. It is clearly more practical to convert each column separately.
Let’s create a sample dataframe that contains columns with object data type.
import numpy as np
import pandas as pdname = pd.Series(['John','Jane','Emily','Robert','Ashley'])
height = pd.Series([1.80, 1.79, 1.76, 1.81, 1.75], dtype='object')
weight = pd.Series([83, 63, 66, 74, 64], dtype='object')
enroll = pd.Series([True, True, False, True, False], dtype='object')
team = pd.Series(['A','A','B','C','B'])df = pd.DataFrame({
'name':name,
'height':height,
'weight':weight,
'enroll':enroll,
'team':team
})
The data type for all columns is object which is not the optimal choice.
df.dtypes
name object
height object
weight object
enroll object
team object
dtype: object
We can use the convert_dtypes function as below:
df_new = df.convert_dtypes()df_new.dtypes
name string
height float64
weight Int64
enroll boolean
team string
dtype: object
The data types are converted to the best possible option. A useful feature of the convert_dtypes function is that we can convert the boolean values to 1 and 0. It is more appropriate for data analysis.
We just need to set the convert_boolean as False.
df_new = df.convert_dtypes(convert_boolean=False)