Indexing
Strings can be considered as a collection of characters. They do not have to be words or meaningful textual data. For instance, “foo123” is a valid string.
Indexing or slicing on strings is a highly practical operations in data manipulation. We can extract any part of strings by providing the indices. Both pandas and tidyverse provide functions to extract part of strings based on an index of characters.
Let’s create a new column by extracting the last two characters of the “ctg2” column.
Pandas:
We can pass the index from the beginning starting from 0 or from the end starting from -1. Since we need the last two, it is more convenient to use the one from the end.
cities["sub_ctg2"] = cities.ctg2.str[-2:]
The start and end of the desired part of string is specified by passing the related indices to the str accessor. The starting index is the second from the end (-2). The ending index is left blank to indicate the end of string.
Tidyverse
The same operation can be done using the str_sub function as follows:
cities <- mutate(cities, sub_ctg2 = str_sub(ctg2, -2, -1))