4 Techniques to Handle NumPy Arrays

Data manipulation in Python is synonymous with NumPy array manipulation: Even Pandas is built around the NumPy array. Although some operations may seem a bit dry, they’re the building blocks of many other operations. So get to know them well.

2.1 NumPy array attributes

First let’s discuss some useful array attributes for random arrays: a one-, two-, and three-dimensional array. Let’s use NumPy’s random number generator and seed it with a set value in order to ensure that the same random arrays are generated each time we run the same code:

Image created by the author

Each array has attributes ndim (the number of dimensions), shape (the size of each dimension), and size (the total size of the array):

Image created by the author

2.2 Array indexing: Accessing single elements

If you’re familiar with Python’s standard list indexing, then this will be a piece of cake, and NumPy will feel very familiar to you. In a one-dimensional array, you can access the ith value (counting from zero) by specifying the index you want in square brackets, just as we do with Python lists:

Image created by the author

In a multidimensional array, you access and modify items using a comma-separated tuple of indices:

Image created by the author

Bear in mind that NumPy arrays have a fixed type (unlike Python lists). This means that if you try to insert a floating-point value to an integer array, the value will be silently truncated.

2.3 Array slicing: Accessing subarrays

Just like using square brackets to access individual elements, we can also use them to access subarrays with the slice notation, specified by the colon (:) character. The NumPy slicing syntax isn’t different from that of the standard Python list. So to access a slice of an array x, remember:

x[start:stop:step]

If any of these are unspecified, they default to the values start=0, stop=size of dimension, and step=1. Let’s take a look at accessing subarrays in one dimension.

Image created by the author

Now, let’s see it working in a multidimensional subarray:

Image created by the author

2.4 Reshaping of arrays

Another useful type of operation is reshaping of arrays. The most optimised way of doing this is by using the reshape() method. For instance, if you want to put the numbers 1 through 9 into a 3×3 grid, you can do the following:

Image created by the author

Note: For reshape() to work, the size of the initial array must match the size of the reshaped array.

Another common reshaping pattern is the conversion of a one-dimensional array into a two-dimensional row or column matrix. You can do this with the reshape method or by using the newaxis keyword within a slice operation:

Image created by the author

2.5 Array concatenation and splitting

All of the previous routines worked on single arrays. But as a data scientist, you’ll often have to combine multiple arrays into one and split a single array into multiple arrays. So concatenation, or the joining of two arrays in NumPy, is primarily done using one of these routines: np.concatenate, np.vstack, or np.hstack.

Image created by the author

For working with arrays of mixed dimensions, it can be clearer to use the np.vstack (vertical-stack) and np.hstack (horizontal-stack) functions:

Image created by the author

2.6 Splitting of arrays

The opposite of concatenation is splitting, which can be achieved by using the functions np.split, np.hsplit, and np.vsplit. For each of these, we can pass a list of indices giving the split points:

Image created by the author

Notice that np.split points lead to N+1 sub-arrays. The related functions np.hsplit and np.vsplit are no different:

Image created by the author

2.1 NumPy array attributes

2.2 Array indexing: Accessing single elements

2.3 Array slicing: Accessing subarrays

2.4 Reshaping of arrays

2.5 Array concatenation and splitting

2.6 Splitting of arrays

Footer