
Before learning the collections module, we need to understand two things. One is the container data type and another one is abstract data types (queue) and (stack). First of all, the container data type is a data type that holds the same or different kinds of values.
Coming to the second thing, we should know how a collection of data made. To know this, let us consider an imaginary memory stick that contains ten memory locations 100, 200,…1000. Now take a linked list containing three elements a1, a2, and a3 and stored in different memory locations of our imaginary memory location. For instance, we can consider that a1 is in 100, a2 in 400, and a3 in 900. Based on the concept of the linked list, the memory location contains two things: One is the value a1 and another one is the location of the next value, a2, i.e., 200. Now we get a linked list.
Let us replace the values a1, a2, and a3 with two different things. One is a set of people at a ticket counter and the other is a set of dinner plates. In a ticket counter set, a1 enters first and gets the ticket first. In the dinner plate set, the plate that enters last is the first one to be taken out.
Now both scenarios follow the same linked list as the base. While designing these kinds of problems virtually, we have to avoid or add some functionality to the linked list. We don’t want to give a ticket to the last person in the queue, so we need to exclude the feature from the data type. Now I hope you can understand how abstract data types work.
Now we need to relate this concept to the data types in Python. Let’s take the dictionary data type in Python. All of us know that dictionary doesn’t maintain an order while storing information. But what if we need to have an ordered set of items without changing the data type? This is where the collections module helps. The collections module has a custom data type called OrderedDict
that allows all the operations corresponding to the dictionary but maintains the elements in an ordered way.
Data types in collections
The altered data types available in the collections module are listed below.
OrderedDict
defaultdict
deque()
namedtuple()
Counter
ChainMap
UserDict
UserList
UserString
The useful collection data types are explained below with a proper set of examples.
1. OrderedDict
OrderedDict
and the regular dictionary both follow the same things. The only thing that differentiates the datatypes is the order of elements present in the dictionary. The regular dictionary container does not have any feature that keeps the order of items. But OrderedDict
keeps the order always. Let me demonstrate two examples with a dictionary and an ordered dictionary containing the same items.
Output
{'Jhon': 9653542156, 'Michael': 9573455871, 'Tom': 9865765634, 'Alan': 8765656343}
The output of the above program clearly shows that the elements in a dictionary are not stored in order. Now let us try the same with OrderedDict
from the collections module. To use this data type, we must import the collections module using the import
keyword.
Output
OrderedDict([('Tom', 9865765634), ('Alan', 8765656343), ('Michael', 9573455871), ('Jhon', 9653542156)])
In this program, all the elements in the dictionary maintain the order that we followed to add elements.
Apart from this, all the methods available in the regular dictionary object can be applied to the OrderedDict
data type. There are two important dedicated methods available for the OrderedDict
data type:
popitem()
methodmove_to_end(key)
method
popitem()
is already present in the regular dictionary but works in a different way.
2. defaultdict()
The defaultdict()
method is a normal dictionary itself, but it doesn’t throw any key error while accessing the unavailable key. Let’s consider the following program that contains a statement trying to access the value with a key that is not present in the dictionary.
Output
Traceback (most recent call last):
File "main.py", line 8, in <module>
print(my_dict['Paul'])
KeyError: 'Paul'
When we try to access a value with this kind of key, the dictionary will throw the KeyError
exception. We can avoid this using exception handling concepts. But the collections module provides an easy way to overcome this problem.
defaultdict()
takes a function or value as input and replaces the value when the user tries to access the dictionary with a new key. The following code will help you to understand the working of the defaultdict
data type.
Output
7855434561
3. deque()
deque()
is a method used to make a double-ended queue data type. As we discussed earlier, in the normal queue, the element at one end only is removed. In double-ended queues, we can perform operations on both ends. Creation of a double-ended queue is very simple. The following program shows how to create a double-ended queue.
from collections import deque
my_dou_que = deque([1,2,3])
print(my_dou_que)
Output
deque([1, 2, 3])
There are multiple methods available in the deque()
object. In the following code, all the methods are used on the previously created deque()
object. Each print statement will show you what changes were made to the object.
Output
4. namedtuple()
If you are familiar with normal tuple data type in Python, you might have seen it accessing the values using the index of the value. A named tuple is a tuple type of object that can be used to access the values with a name. The method namedtuple()
is used to assign a name to the tuple. The syntax for using the namedtuple()
is given below.
Syntax: namedtuple(name, iterable)
The name parameter is a variable name for the tuple, and the iterable will contains different names for each category of data. If this is still confusing, look at the following code.
from collections import namedtuple
vehicle = namedtuple('car',['company','type','price'])
C1 = vehicle("XYZ", "ABC", 120000)
print(C1)
Output
car(company='XYZ', type='ABC', price=120000)
The method created a tuple named “car” and stores a collection of key-value pairs in a tuple format. The major advantage of using this named tuple in Python is that we can access the values using both index and name. The following code will show you how to access the price of the car using both index and name.
from collections import namedtuple
Vehicle = namedtuple('Car',['company','type','price'])
C1=Vehicle("XYZ","ABC",120000)print(C1[2])
print(C1.price)
Output
120000
120000
5. Counter
The count
method in a collection data type is used to get the number of occurences of a particular element. The major drawback of using the count
method is that this method returns the count of only one value. If you want to know the count of all the different elements in a list, then the Counter()
method will be a good choice. This method returns a dictionary in which the key is the element and the value is the count. The following program will show you the use of the Counter()
method in Python.
from collections import Counter
my_list = [1, 2, 2, 2, 6, 3, 7, 2, 1, 6, 7, 8]
print(Counter(my_list))
Output
Counter({2: 4, 1: 2, 6: 2, 7: 2, 3: 1, 8: 1})
6. ChainMap
ChainMap
is a method that is used to combine different dictionaries into one data type. Using the ChainMap
object, we can access the keys and values of all the dictionaries.
from collections import ChainMap
my_dict_1 = {'A':1, 'B':2, 'C':3}
my_dict_2 = {'X':3, 'Y':4, 'Z':2}C1 = ChainMap(my_dict_1, my_dict_2)
print(C1)
Output
ChainMap({'B': 2, 'A': 1, 'C': 3}, {'Z': 2, 'X': 3, 'Y': 4})
Let us try to print all the keys and values of the combined dictionary in the previous program.
Output
ChainMap({'B': 2, 'C': 3, 'A': 1}, {'Z': 2, 'X': 3, 'Y': 4})
['B', 'X', 'A', 'Y', 'C', 'Z']
[2, 3, 1, 4, 3, 2]