NNCF provides the use of implemented optimization methods in two different ways: by means of supported training samples or through integration into the custom training code.
Using NNCF within your training code
Let us describe the steps required to modify an existing PyTorch training pipeline to integrate NNCF into it. The use case implies that the user already has a training pipeline that reproduces model training in the floating-point precision and a pre-trained model snapshot. The objective of NNCF is to prepare this model for accelerated inference by simulating the compression at train time.
Below are the steps needed to integrate NNCF into an existing PyTorch project:
Step 1: Create an NNCF configuration file.
A JSON configuration file is used for an easier setup of the compression parameters to be applied to your model. Reference configuration files can be found within example training scripts that are packaged with NNCF.
Step 2: Modify the training pipeline code.
NNCF enables compression-aware training by being integrated into regular training pipelines. The framework is designed so that the modifications to the original training code are minor.
- The following imports are required for NNCF:
import nncffrom nncf import Config, create_compressed_model, load_state
Load the NNCF JSON configuration file that you prepared during Step 1:
nncf_config = Config.from_json(“/path/to/nncf_config.json”)
Right after the model instance is created in the training pipeline and the weights for compression-aware training to start from are loaded, wrap the model via the following call:
compression_ctrl, model = create_compressed_model(model, nncf_config)
where compression_ctrl is a “controller” object that can be used during compressed model training to adjust compression algorithm parameters or gather statistics.
- In the case of multi-GPU training, wrap your model using the DataParallel or DistributedDataParallel PyTorch classes as usual. In the case of the distributed setting, call the compression_ctrl.distributed() method after that as well.
- Call the compression_ctrl.initialize(data_loader) method before the start of your training loop to initialize model parameters related to its compression (e.g. parameters of FakeQuantize layers) by feeding it some data (via the data_loader for the training dataset).
- The following changes have to be applied to the training loop code: after model inference is done on the current training iteration, the compression loss should be added (using the + operator) to the common loss. For instance, for the cross-entropy loss in a classification task:
loss = cross_entropy_loss + compression_ctrl.loss()
- Call the compression algorithm scheduler step() after each training iteration: compression_ctrl.scheduler.step()
- Call the compression algorithm scheduler epoch_step() after each training epoch: compression_ctrl.scheduler.epoch_step()
Step 3: Run the training pipeline.
At this point, NNCF is fully integrated into your training pipeline. You can run it as usual and monitor your original model’s metrics and/or compression algorithm metrics and balance model metrics quality vs. the level of compression.
Training samples
Training samples provided with NNCF are separated into three general categories according to the computer vision task type considered within each sample (image classification, object detection, and semantic segmentation samples, respectively).
The image classification sample contains training pipelines on standard classification benchmark datasets (ImageNet, CIFAR-100 & CIFAR-10) using the zoo of classification nets from the torchvision package. Configuration files provided within the sample include examples of INT8 quantization, sparsification, combined quantization+sparsification and binarization for some common classification nets. The sample training pipeline supports multi-GPU training and allows exporting the compressed models into ONNX files that are supported by the OpenVINO toolkit.
The object detection sample contains an analogous training pipeline for the Pascal VOC benchmark dataset and SSD300 and SSD512 models (with VGG backbones).
The semantic segmentation sample contains the training pipelines for UNet and ICNet models on several benchmark datasets for segmentation tasks (CamVid, Cityscapes, Mapillary Vistas, Pascal VOC).
Integration into third-party code
If you wish to use a more complex training pipeline than those present in the training samples, or to apply NNCF algorithms within the frameworks of larger training code/model collections, you can install NNCF and import as a regular Python package into the corresponding code base and use the full extent of neural network compression with only minor adjustments to the rest of the training pipeline.
NNCF ships with code patches to the well-known mmdetection and transformers GitHub repositories, which, when applied, enable model compression during training without disruption of the regular training process in corresponding repositories. These patches illustrate the general process of integrating NNCF into third-party model training code and also allow the user to instantly enable NNCF-based compression fine-tuning capabilities for the state-of-the-art object detection and NLP models found in mmdetection and transformers with <1% accuracy drop. The list of patches will be extended with other prominent repositories, but the patches mostly serve the purpose of results showcasing and guiding the user on the integration of NNCF into any Pytorch-based training pipeline by example.