
This is how Jetson looks like now with a new camera and a new chassis.
Having more valuable pixels flowing into the system should already improve its performance, but it doesn’t mean we should stop there. Our end-to-end vision system is still in the very early stage of development and there is still potential to make it more robust.
CNN
The core of the visual perception system lays in the Convolutional Neural Network with the resnet18 backbone.
Final Fully Connected Layers
For the simple path following we did in the previous part of the series, resnet18’s capabilities were enough to handle the corresponding relatively simple dataset.
However, we are taking one step forward and we are going to extend our dataset with additional, more complex scenarios. In order for the network to be able to handle them, we need to add additional layers/neurons.
Being inspired by Nvidia’s end-to-end network, and most specifically by the stack of fully connected layers after the convolutional backbone,
we are going to add two additional fully connected layers at the end of the network.
self.network = torchvision.models.resnet18(pretrained=pretrained)
self.network.fc = torch.nn.Sequential(
torch.nn.Linear(in_features=self.network.fc.in_features, out_features=128),
torch.nn.Linear(in_features=128, out_features=64),
torch.nn.Linear(in_features=64, out_features=OUTPUT_SIZE)
)
Adding more layers, thus increasing the size of the network is a double-edged sword. On the one hand, it allows the network to learn more complex patterns, but on the other hand, if the network is too big, it may lead to the undesired overfitting, where the network instead of learning the patterns and generalizing, learns the ‘correct answers’ for the training set.
Dropout
One of the possible solutions for the overfitting problem is Dropout. Dropout is a regularization technique that with a given probability removes connections between neurons. It forces the network to spread the learned information across multiple neurons, instead of single ones, which would prevent the network from simply ‘memorizing’ the training dataset.
overfitting”, JMLR 2014)
‘Can’t rely on any, one feature so have to spread out the weights’ Andrew Ng