23. Rows subset observatons
data[data.Length > 10] #Extract rows that meet logical criteria.
df.sample(frac=0.5) #Randomly select fraction of rows.
df.sample(n=10) #Randomly select n rows.
df.iloc[10:20] #Select rows by position.
24. Count values
We can examine how many times a value occurs in the dataset.
data["Type 2"].value_counts()
25. Select rows and columns using labels(loc):
you can access rows and columns by their corresponding labels into a pandas data frame.
Select a single row by label:
data1.loc[0]
Accessing multiple rows by label:
data.loc[[0,1]]
Accessing by row and column:
data.loc[0,"Type 2"]
Selecting Single row, multiple columns:
data1.loc[1,['Name','Attack','Defense']]
26. Select by index position(iloc):
Row by index location:
data1.iloc[1] # index starts from zero
Column by index location:
data.iloc[:, 3] # 'Type 2' column in pokeman data
27. Slicing rows and columns using labels:
Slice rows by labels:
data.loc[1:3, :]
Slice columns of labels:
data.loc[1:3, 'Type 1':'Type 2']
28. Slice rows and columns using position(iloc):
Index starts from 0 to (number of rows/columns -1).
slice rows by index position:
data.iloc[0:3,:]
slicing columns by index position:
data.iloc[:,1:3]
slice row and columns by index position:
data1.iloc[1:2,1:3]
29. Handling missing values
In pandas dropna() function is being used to remove rows and columns with Null/NaN values. This function used a lot when there is any missing data in the data frame.
The syntax of dropna() function looks as per the below
dropna(self, axis=0, how="any", thresh=None, subset=None, inplace=False)
- axis: possible values are 0 or 1 default is 0. if 0 drop rows, if 1 drop columns.
- how: possible values are (any, all), default is any. If any drop the row/column where values is null. If all, drop row/columns.
- thresh: an int to specify the threshold for the drop.
- subset: specifies the rows/columns to look for null values.
- inplace: a boolean value. If True, the source DataFrame will be changed and it will return none.
Dropping all the rows with null values. After removing na we have 414 rows left in the data frame.
Dropping the columns with missing values:
30. Merging datasets
Merge or join operations combine data sets by linking rows using one or more keys.
Main arguments of pd.merge and their description
left – DataFrame to be merged on the left side.
right – DataFrame to be merged on the right side.
how – One of ‘inner’, ‘outer’, ‘left’ or ‘right’. ‘inner’ by default.
on – Column, names to join on. Must be found in both DataFrame objects.
we create two data frames to go through with all of these merge operations in a data frame.
df1 = pd.DataFrame({'key': ['a', 'b', 'c', 'd', 'e','f' , 'g'], 'data1': range(7)})
df1
df2 = pd.DataFrame({'key': ['a', 'b', 'd'],'data2': range(3)})
df2
Inner Join– join matching rows from df2 to df1.
Returns a data frame containing all the rows of the left data frame.
pd.merge(df1, df2,how='left', on='key')
right join – join matching rows from df1 to df2.
All the rows of the right data frame are taken as it is and only those of the left data frame that are common in both.
pd.merge(df1, df2,how='right', on='key')
inner join – Returns a data frame with only those rows that have common in both data frames.
pd.merge(df1, df2,how='inner', on='key')
outer join – retain all of the data that is available in all rows and all values
pd.merge(df1, df2,how='outer', on='key')
Ref:
[1] https://www.kaggle.com/abcsds/pokemon
[2]https://www.oreilly.com/library/view/python-for-data/9781449323592/