TensorFlow: Accelerate NumPy with GPUs

Array Sorting

Let’s start with something simple: sorting an array. NumPy’s sort (in both NumPy & TensorFlow) actually support different sorting algorithms depending on the parameter kind. By default, if your NumPy version is newer than 1.12, it will use introsort (quick link to wiki) or heapsort (quick link to wiki) which has a worst-case complexity of O(n log(n)).

Interface:

import numpy as np
import tensorflow.experimental.numpy as tnp# NumPy APInp.sort(
a: np.ndarray or tnp.ndarray, 
axis: int = -1, 
kind: str = 'quicksort', 
direction: str or list of str, optional = None
)# TensorFlow's NumPy APItnp.sort(
a: np.ndarray or tnp.ndarray, 
axis: int = -1, 
kind: str = 'quicksort', 
direction: str or list of str, optional = None
)

Experiment Setup:

We will be sorting a pre-generated vector with type ndarray with random floats in [0., 1.). Introsort will be used as the sorting algorithm. The average time of 100 runs will be taken as the final reading.

Performance:

Runtime Comparison for Sorting — Image by Author

Speed Comparison for Sorting — Image by Author

Timesheet for Sorting — Image by Author

Relative Speed for Sorting — Image by Author

As TensorFlow has more overhead than NumPy when dispatching operations, TensorFlow’s NumPy API will have a significantly worse performance when compared against NumPy. As a rule of thumb, TensorFlow suggests using NumPy’s API if the operation is expected to finish in about 10 milliseconds.

Indeed, TensorFlow’s suggestion aligns with our observations in the charts above. TensorFlow’s relative speed with a GPU session is higher than NumPy as the array length grow pass 10,000 to 100,000 items depending on whether you pass a np.ndarray or a tnp.ndarray to TensorFlow’s NumPy API.

Another point that would not come as a surprise is that TensorFlow with CPU session provides the worst time complexity in the long run. If TensorFlow cannot use a GPU to speed up, it’s basically running on as many resources as NumPy can leverage by itself. And that is before considering that TensorFlow’s overhead would further reduce the overall throughput and unsurprisingly sinking it to the bottom of the ranking table.

Array Sorting

Footer