## Use Quantization aware training of TensorFlow’s model optimization toolkit to create four times smaller models that do not suffer a drop in results.

I recently had an article on different TensorFlow libraries, and one of them was TensorFlow’s Model Optimization Toolkit.

The model optimization toolkit provides pruning, quantization, and weight clustering techniques to reduce the size and latency of models. Quantization can be performed both during and after training and converts the models to use 8-bit integers instead of the 32-bit floating-point integers. However, quantization is a lossy process. TfLite models are also quantized, due to which they are not as accurate as the original models. To solve this issue, quantization aware training can be used. It converts the weights to int-8 while training before converting it back to 32-bit float, so it acts like noise for the models forcing them to learn accordingly.

So in the rest of the article, this is what we will do. We will generate TfLite models after both quantizing and without quantizing and then compare them based on their sizes and accuracies.

- Requirements
- Creating quantization aware models
- Converting them to TfLite
- Results

The TensorFlow model optimization toolkit needs to be installed along with the normal TensorFlow distribution. They can be pip installed using the following statements:

`pip install tensorflow`

pip install -q tensorflow-model-optimization

To use quantization aware training, the models need to be wrapped in the `tfmot.quantization`

class. The whole model can be wrapped, or you can wrap certain layers that you want to. It is suggested to train the models first and then apply fine-tuning using the wrapped model; otherwise, the model does not perform very well. I will discuss the minimum required section necessary in this article, but this post can be referred to for a detailed readthrough.

Create a simple model using Keras TensorFlow with any of the Sequential or Model methods. Below, I have given an example of a straightforward model created for the MNIST dataset using the Model method and trained it for 20 epochs.

inp = tf.keras.layers.Input(shape=(28, 28, 1))

x = tf.keras.layers.Conv2D(64, kernel_size = (3, 3), padding = 'same', activation='relu')(inp)

x = tf.keras.layers.Conv2D(32, kernel_size = (3, 3), padding = 'same', activation='relu')(x)

x = tf.keras.layers.Dropout(0.5)(x)

x = tf.keras.layers.Conv2D(16, kernel_size = (3, 3), padding = 'same', activation='relu')(x)

x = tf.keras.layers.Dropout(0.25)(x)

x = tf.keras.layers.Flatten()(x)

x = tf.keras.layers.Dense(10)(x)model = tf.keras.models.Model(inputs=inp, outputs=x)model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, metrics=['accuracy'])model.fit(train_images, train_labels, epochs=20, validation_split=0.1, batch_size=500)

To convert this model to use quantization aware training:

import tensorflow_model_optimization as tfmot

quantize_model = tfmot.quantization.keras.quantize_model

q_aware_model = quantize_model(model)q_aware_model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, metrics=['accuracy'])q_aware_model.fit(train_images_subset, train_labels_subset, batch_size=500, epochs=20, validation_split=0.1)

Here’s how their histories look: