The question is whether we can only calculate the convolution with the sparse data efficiently instead of scanning all the image pixels or spatial voxels.

One intuitive thinking is, regular image signals are stored as matrix or tensor. And the corresponding convolution was calculated as dense matrix multiplication. The sparse signals are normally represented as data lists and index lists. We could develop a special convolution schema that uses the advantage of sparse signal representation.

In order to explain the concept of sparse convolution step by step and let it easier to read, I use 2D sparse image processing as an example. Since the sparse signals are represented with data list and index list, 2D and 3D sparse signals have no essential difference.

As Fig.2 illustrated, we have a 5×5 image with 3 channels. All the pixels are (0, 0, 0) except two points P1 and P2. P1 and P2, such nonzero elements are also called **active input sites**, according to [1]. In dense form, the input tensor has a shape [1x3x5x5] with NCHW order. In sparse form, the data list is `[[0.1, 0.1, 0.1], [0.2, 0.2, 0.2]]`

, and the index list is `[[1,2], [2, 3]]`

with YX order.

Fig.3 is an example kernel, which has kernel size 3×3. The dark and light colors stand for 2 filters. We define the convolution operation as

`conv2D(`*kernel_size*=3, *out_channels*=2, stride=1, padding=0)

The sparse convolution has 2 kinds of output definitions [1]. One is regular output definition, just like ordinary convolution, calculate the output sites as long as kernel covers an input site. The other one is called the submanifold output definition. the convolution output will be counted only when the kernel center covers an input site.

Fig.4 illustrates the difference between these two kinds of outputs. A1 stands for the active output sites, i.e. the convolution result from P1. Similarly, A2 stands for the active output sites which are calculated from P2. A1A2 stands for the active output sites which are the sum of outputs from P1 and P2. Dark and light colors stand for different output channels.

So in dense form, the output tensor has a shape [1x2x3x3] with NCHW order.

Traditional convolution normally uses *im2col *[5] to rewrite convolution as a dense matrix multiplication problem. However, sparse convolution [1] uses a rule book to schedule all atomic operations instead of *im2col*.

## Build the hash table

The first step is to build hash tables.

In fig.5, the input hash table stores all active input sites. Then we estimated all possible active output sites, considering either regular output definition or submanifold output definition. At last, using the output hash table to record all involved active output sites.

Note, in fig.5 my convention is not very clear, the **v** is more like a hash key, and the **key** is more like a hash value. However, neither of them has duplicate elements. Thus which one should be the key depends on the program.

## Build Rule Book

The second step is to build the *Rulebook*. This is the key part of sparse convolution. The purpose of *Rulebook* is similar to *im2col* [5], which converts convolution from mathematic form to an efficient programmable form. But unlike *im2col*, *Rulebook* collects all involved atomic operation in convolution, then associate them to corresponding kernel elements.

Fig.6 is an example of how to build *Rulebook*. Pᵢₙ has the input sites index. In this example, we have two nonzero data at position (2, 1) and (3, 2). Pₒᵤₜ has the corresponding output sites index. Then we collect the atomic operations from the convolution calculation process, i.e. consider the convolution process as many atomic operations w.r.t kernel elements. At last, we record all atomic operations into *Rulebook*. In Fig.6 the right table is an example of the *Rulebook*. The first column is the kernel element index. The second column is a counter and index about how many atomic operations this kernel element involves. The third and fourth column is about the source and result of this atomic operation.

## Calculation Pipeline for GPU

The last step is the overview of the programming calculation pipeline.