Why do we batch the dataset before training?
As a Machine Learning practitioner, you’ve probably wondered why is it a standard processing tp batch training data before feeding it to a neural network?
A straightforward answer is that training data or data used within neural networks are batched mainly for memory optimisation purposes. Placing a whole dataset, for example, all 60,000 of the MNIST training dataset, in a GPU’s memory is very expensive. You would probably run into the infamous “RuntimeError: CUDA error: out of memory”.
To avoid memory issues when training a neural network, large datasets are batched in sets of 16, 32, or 128. The batch number depends on the compute resource’s memory capacity.