

Pose estimation is a technique that uses computer vision and machine learning models to determine the pose of a person. It predicts where key body points are by processing images fed through a neural network. The model used in this example, PoseNet, estimates 17 different body part points!
Pose Estimation is often applied in domains such as animation, augmented reality (AR), and robotics. For these applications, the pose of the person is estimated in 2D or 3D space.
Neural Network Information
The PoseNet model uses a MobileNet backbone, which leverages depthwise separable convolution operations to reduce the number of parameters in the network and thus the model size.
The model is also quantized which takes 32-bit float weights and reduces them to 8-bit integer weights. This enables faster processing with a small drop in model accuracy. This also allows the model to operate on lightweight hardware like the Raspberry Pi!
Software Implementation
This implementation uses the PoseNet model integrated in TensorFlow Lite, everything is written in Python to be run on the Raspberry Pi 4.
The code written for this project develops a pipeline to feed images to the model, process them using a TensorFlow pretrained model, decode the model output, and draw key points and limbs on the processed images. Post-processing can be used to convert these images into a video!
Hardware Implementation
The project uses Raspberry Pi 4GB, Raspberry Pi Camera Module and a small breadboard with an LED, resistor, and push button. This is the hardware configuration to run the pipeline:
Code and setup information for this project can be found here: https://github.com/ecd1012/rpi_pose_estimation
Browser Demo
Google provides code to run pose estimation on Android and iOS devices.
Here is a demo of the model if you want to try it for yourself! 🙂
https://storage.googleapis.com/tfjs-models/demos/posenet/camera.html
Additional Resources:
Sources: