TF Lite models are lightweight models, production-ready and cross-platform framework for deploying ML models that are used to get inferences on edge devices like mobile phones and microcontrollers.
ML Engineers who are looking for ways to optimize models for deployment purposes.
Let’s take an example of a model which you have created and trained and now you want to make an inference of your model on edge devices like smartphones, raspberry pi and jetson nano.
To get a good prediction at your end your model should pass the following criteria, like
- The model should be light-weighted to load the model and the edge devices also have limited storage capacity.
- Availability of memory is less in edge devices to perform really well. Power Consumption also increases in edge devices that operate on battery so power consumption is a big factor.
- The output should have low latency i.e the prediction should be less. If your model architecture is extensive like VGG16 or YOLO v4 to get an inference you need to have a good high-end device at your end for prediction.
- If your model is deployed on a server then it will take a lot of time to make an inference and also overall power consumption will also increase.
So deploying ML models is a big challenge.
- Lower Latency — Prediction time is really quick.
- Tensorflow Hub provides a great collection of pre-trained models which we can use in our small projects.
- It is Privacy-Preserving because the predictions are made on devices and there is no transfer of data between your device and servers.
- It has the capability to work offline.
Google Translate
AR feature on Youtube
Installing the necessary modules
import os
import time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score
from sys import getsizeofimport tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.losses import SparseCategoricalCrossentropyprint(tf.__version__)
2.4.1
In this demonstration, we will be using tensorflow version 2.4.1
If you are using TF 1.x, then this code might not be relevant so it’s better to upgrade to TF 2.x.
def get_file_size(file_path):
size = os.path.getsize(file_path)
return sizedef convert_bytes(size, unit=None):
if unit == "KB":
return print('File Size: ' + str(round(size/1024, 3)) + 'Kilobytes')
elif unit == 'MB':
return print('File Size: ' + str(round(size/(1024*1024), 3)) + 'Megabytes')
else:
return print('File Size: ' + str(size) + 'bytes')
These are two helper functions that determine the file size in Kilobytes(KB) and Megabytes(MB).
For this demonstration, I will be using the famous fashion MNIST dataset.
This dataset contains 70,000 grayscale images in 10 categories. The images show individual articles of clothing at low resolution (28 by 28 pixels), as seen here:
fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_lables) = fashion_mnist.load_data()
These are the different classes for our image classification.
class_name = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle Boot']
train_images.shape
(60000, 28, 28)test_images.shape
(10000, 28, 28)np.unique(train_labels)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint8)plt.figure()
plt.imshow(train_images[19], cmap="gray")
plt.colorbar()
plt.grid(False)
plt.show()
Now let’s normalize the image. The image intensity value ranges from 0 to 255. Now if we have to build a neural network that converges quickly, we have to bring down the intensity value from 0 to 1.
The highest intensity value is 255 so if we divide all the values by 255, the range will now become from 0 to 1.
train_images = train_images/255.0
test_images = test_images/255.0
Optimiser — Adam
Loss Function Used — SparseCategoricalCrossentropy
Metrics — Accuracy
For more information visit:
model.compile(optimizer='adam', loss=SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.summary()
It’s time to train our model with the training images and training labels with 10 epochs.
h = model.fit(train_images, train_labels, epochs=10)
After fitting the model, let’s save our model as a Keras file i.e h5 file.
KERAS_MODEL_NAME='tf_MODEL_FASHION_MNIST.h5'
model.save(KERAS_MODEL_NAME)
Let’s check the accuracy and loss with the test images
test_loss, test_acc = model.evaluate(test_images, test_lables, verbose=2)
print('Test Accuracy:', test_acc)
print('Test Loss:', test_loss)
Test Accuracy: 0.8855000138282776
Test Loss: 0.3289433717727661
Now let’s convert the Keras file to TF-Lite Model
TF_LITE_MODEL_FILE_NAME = 'tf_lite_model.tflite'tf_lite_converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = tf_lite_converter.convert()tflite_model_name = TF_LITE_MODEL_FILE_NAME
open(tflite_model_name, "wb").write(tflite_model)
Checking Input Tensor Shape
interpreter = tf.lite.Interpreter(model_path = TF_LITE_MODEL_FILE_NAME)
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()print("Input Shape:", input_details[0]['shape'])
print("Input Type:", input_details[0]['dtype'])
print("Output Shape:", output_details[0]['shape'])
print("Output Type:", output_details[0]['dtype'])
This is the configuration we are getting of the model.
Input Shape: [ 1 28 28]
Input Type: <class 'numpy.float32'>
Output Shape: [ 1 10]
Output Type: <class 'numpy.float32'>
Resizing the Tensor Shape according to our needs. We are testing on 1000 images.
interpreter.resize_tensor_input(input_details[0]['index'], (10000, 28, 28))
interpreter.resize_tensor_input(output_details[0]['index'], (10000, 10))
interpreter.allocate_tensors()input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()print("Input Shape:", input_details[0]['shape'])
print("Input Type:", input_details[0]['dtype'])
print("Output Shape:", output_details[0]['shape'])
print("Output Type:", output_details[0]['dtype'])
This is the configuration we are getting of the model after resizing.
Input Shape: [10000 28 28]
Input Type: <class 'numpy.float32'>
Output Shape: [10000 10]
Output Type: <class 'numpy.float32'>
Let’s check the type of testing images.
test_images.dtype
dtype('float64')
It is a float64 type.
There are many ways to change the type but we will be using a numpy array to change it.
test_imgs_numpy = np.array(test_images, dtype=np.float32)
Let’s get the predictions from the test images.
interpreter.set_tensor(input_details[0]['index'], test_imgs_numpy)
interpreter.invoke()tflite_model_predictions = interpreter.get_tensor(output_details[0]['index'])print("Prediction results shape:", tflite_model_predictions.shape)
prediction_classes = np.argmax(tflite_model_predictions, axis=1)acc = accuracy_score(prediction_classes, test_lables)
print('Test accuracy TFLITE model :', acc)
Test accuracy TFLITE model : 0.8855
convert_bytes(get_file_size(KERAS_MODEL_NAME), "MB")
convert_bytes(get_file_size(TF_LITE_MODEL_FILE_NAME), "KB")
File Size: 1.19Megabytes
File Size: 398.969Kilobytes
The Keras file is around 1.2MB which tends to increase if we add more layers to our model. After creating the TF-Lite it is around 400KB.
The accuracy of the Keras model on test images is
88.55000138282776% and the TF-Lite file size is 88.55%.
You can clearly see that we have converted our model still, the accuracy remains almost the same.
If you want to read more about Tensorflow Lite, head over to:
In the next blog, I will be talking about optimizing the tflite model by using quantization. Stay Tuned 🙂
Follow me on Github
Instagram: https://www.instagram.com/sayannath235/
LinkedIn: https://www.linkedin.com/in/sayannath235/
Mail: sayannath235@gmail.com