Exploring Important Feature Repressions in Deep One-class Classification (DROC)

ICLR 2021

The data we routinely collect contains only a small amount of anomalous data. This is a very pleasing fact of normal life 🙂 Only a few defective products are encountered on factory production lines, and medical data on rare cases are presented in papers in medical societies as new discoveries. In other words, collecting anomalous data is a very costly task.

It is obvious to anyone that it is more reasonable to train only normal data to detect anomalous cases than to spend a lot of money to collect various patterns. This learning method with training only normal cases dataset is called one-class classification, as various anomalous cases aimed to be excluded from the actual cases.

In this story, Learning and Evaluating Representations for Deep One-class Classification, by Google Cloud AI, is presented. This is published as a technical paper of ICLR 2021. In this paper, the two-stage framework for deep one-class classification, composed of state-of-the-art self-supervised representation learning followed by generative or discriminative one-class classifiers[]. The major contribution of this paper is the proposal of a novel distribution-augmented contrastive learning method. The framework does not only learn a better representation, but it also permits building one-class classifiers that are more faithful to the target task.

They even made the code available for everyone on their Github!
Let’s see how they achieved that. I will explain only the essence of DROC, so those who want to know more should click on DROC paper.

In this paper, anomaly detection approach with a two-stage framework for Deep Representation One-class Classification (DROC) is proposed. In the first stage, training in a deep neural network of self-supervised learning to obtain a high-level representation of the data, a mapping f to a generic high-level latent representation is obtained; then, in the second stage, the mapping f obtained in the first stage is used to map the data to the latent space and OC- Applies to traditional one-class classifiers such as SVM and KDE.

Fig. 1 Overview of two-stage framework for building a deep one-class classifier. (a) In the first stage, learning representations from one-class training distribution using self-supervised learning methods, and (b) in the second stage, training one-class classifiers using learned representations.

・In order to modify Contrastive learning [Cheng et al., 2020] suitable for one-class classification, the author proposes a distribution augmentation contrast learning method. Specifically, the system learns by identifying the type of augmentation applied to the data using geometric transformations of the image [Gidaris et al., 2018], horizontal flips and rotations (0,90,180,270). This allows dealing with images with outliers (anomalous data) that are rotated. Optimize a self-supervised loss function that minimizes the distance between samples from the same image using different data augmentation functions and maximizes the distance between samples from different images using the same augmentation function. This reduces the uniformity across the hypersphere and allows for separation from outliers.

・The idea that “the less uniformity, the better for One-class Classification’’ is wrong!! A fundamental trade-off between the amount of information and the uniformity of representation was identified. It is often thought that “the lower the uniformity, the better for One-class Classification’’, but DistAug has effectively shown that this is in fact not true.

Contrastive learning [Cheng et al., 2020, Le-Khac et al., 2020] is an algorithm that formulates a task to find similarities and dissimilarities in ML models. It first learns a general representation of an image on an unlabeled dataset, and then fine-tunes it on a small dataset of labeled images for a specific classification task. Using this approach, a machine learning model can be trained to classify similar and dissimilar images.

The SimCLR framework [Cheng et al., 2020] is a powerful network that uses Contrastive learning to learn representations by maximizing the agreement between different augmented views of the same data example via Contrastive learning in the latent space. SimCLR learns representations by maximizing the agreement between different augmented views of the same data example via Contrastive learning in the latent space. For more details, I refer you to the excellent description by Aakash Nain and Thalles Silva.

Fig. 2 Simple framework for contrastive learning of visual representations.

In some cases, training deep one-class classifier results in a degenerate solution that maps all data into a single representation, which is called hypersphere collapse [Ruff et al., 2018]. The authors propose distribution-augmented contrastive learning, with the motivation of reducing uniformity across the hypersphere to allow separation from outliers.

As shown in Figure 3, DistAug is used to increase the number of images. It’s not only learning to identify different instances from the original distribution, but also identifies the type of augmentation, such as rotation, to identify instances from different distributions.

Fig. 3 Distribution-augmented contrastive learning

Fig. 4 (a) When representations are uniform, isolating outliers is hard. (b) Reducing uniformity makes the boundary between inlier and outlier clear. (c ) Distribution augmentation allows inlier distribution more compact.

Distribution augmentation (DistAug) training is a distribution augmentation approach for one-class contrast learning inspired by RotNet [Golan et al., 2018]. It does not model the training data distribution, but rather the sum of the training data augmented by rotation augmentations such as rotations and horizontal flips. As shown in figure 4, to isolate outliers, it is more effective to increase the distribution as in (c ) than to decrease the uniformity as in (b). The authors make it clear that their argument is not “less uniformity is better for OCC.’’

From the table, we can see that the Distribution Augmentation contrastive learning method has improved the performance of previous studies in experimental tests of detection and localization in both object and texture categories.

This experiment shows that methods that rely on geometric transformations are particularly effective in detecting anomalies in the “object” category, since they learn to represent visual objects.

Experimental results using the MVTech dataset

Figures 5 and 6 show the visualization of the localization of the defects appearing in the industrial products in the MVTec dataset. All the following figures show, from left to right, the defect input data of the test set, the ground-truth mask, and the heatmap visualization of the localization.

Fig. 5 Visualization of localization using MVTech dataset

Fig. 6 Visualization of localization using MVTech dataset

Reference

[Chen et al., 2020] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. “A simple framework for contrastive learning of visual representations,’’ arXiv, abs/2002.05709, 2020

[Sohn et al.,2020] Sohn, Kihyuk, C. Li, Jinsung Yoon, Minho Jin and T. Pfister. “Learning and Evaluating Representations for Deep One-class Classification,’’ ArXiv, abs/2011.02578, 2020

[Gidaris et al., 2018] Spyros Gidaris, Praveer Singh, and Nikos Komodakis. “Unsupervised representation learning by predicting image rotations,’’ In Sixth International Conference on Learning Representations, 2018.

[Ruff et al., 2018] Lukas Ruff, Robert Vandermeulen, Nico Goernitz, Lucas Deecke, Shoaib Ahmed Siddiqui, Alexander Binder, Emmanuel Muller, and Marius Kloft. “Deep one-class classification,’’ In ¨ International conference on machine learning, pages 4393–4402, 2018

[Golan et al., 2018] Izhak Golan and Ran El-Yaniv. “Deep anomaly detection using geometric transformations,’’ In Advances in Neural Information Processing Systems, pages 9758–9769, 2018.

Biomedical Image Segmentation

2018: [UOLO]

Image Clustering

2020: [DTC]

Data Uncertainty Learning

2020: [DUL]

One-Class Classification

2019: [DOC]

Reference

Biomedical Image Segmentation

Image Clustering

Data Uncertainty Learning

One-Class Classification

Footer