Novelty detection is the task of classifying test data that differ in some respect from the data that are available during training. This may be seen as “one-class classification”, in which a model is constructed to describe “normal” training data. The novelty detection approach is typically used when the quantity of available “abnormal” data is insufficient to construct explicit models for non-normal classes. An application includes inference in datasets from critical systems, where the quantity of available normal data is huge, such that “normality” may be accurately modeled.
These include the detection of medical diagnostic problems, faults and failure detection in complex industrial systems, structural damage, intrusions in electronic security systems, such as credit card or mobile phone fraud detection, video surveillance, mobile robotics, sensor networks, astronomy catalogs, and text mining.
In this tutorial series, I will show you how to implement a generative adversarial network for novelty detection with Keras framework. Part 1 covers the how the model works in general while part 2 gets into the Keras implementation.
The model we are going to build named Adversarially Learned One-Class Classifier or ALOCC for short as shown in my source code. The image below shows the model contains an auto-encoder connected to a CNN classifier acts as a discriminator as you might hear of in GAN. Before going any further, let's have a quick review of those two separate network architectures.
Autoencoder is a data compression algorithm where there are two major parts, encoder, and decoder.
The encoder's job is to compress the input data to lower dimensional features. For example, one sample of the 28x28 MNIST image has 784 pixels in total, the encoder we built can compress it to an array with only ten floating point numbers also known as the features of an image. The decoder part, on the other hand, takes the compressed features as input and reconstruct an image as close to the original image as possible. Autoencoder is unsupervised learning algorithm in nature since during training it takes only the images themselves and not need labels. Since it is an unsupervised learning algorithm, it can be used for clustering of unlabeled data as seen in my previous post - How to do Unsupervised Clustering with Keras.
Previously, the reconstruction error of an auto-encoder, trained on samples from the target class, has been proven a useful measure for novelty sample detection.
Though GANs are quite advanced and might look daunting to beginners, I find the novice painter(Generator) VS novice connoisseur(Discriminator) analogy quite easy to understand.
Two ambitious individuals, a painter, and a connoisseur are just making their first step to the art industry. The painter wants to trick the connoisseur into grading his painting as high as possible while the connoisseur wants to discern good from bad paintings as accurately as possible. So they play an infinite game over and over again repeating those three steps.
After hundreds even thousands of rounds competing against each other, the novice painter becomes a master printer whose paintings are almost indistinguishable to Van Gogh's, and the new connoisseur becomes a specialist who can recognize real authentic paintings out of tons of counterparts with unbeatable precision.
What this metaphor tells is there are two individual networks in GAN, generator, and discriminator trained separately to minimize their losses while adversarially improving each other.
The proposed architecture, similar to GANs, comprises two modules, which compete to learn while collaborating with each other for the detection task.
The first module denoted as R or Reconstruct module looks similar to an auto-encoder while been adversarially trained to minimize two loss functions jointly,
The second module denoted as D or Discriminator learns to distinguish original normal (class label 1) samples from the reconstructed ones(class label 0).
R learns to efficiently reconstruct the positive normal samples, while for negative or novelty samples it is unable to reconstruct the input accurately, and hence, for negative samples it acts as a decimator or informally a distorter.
In the testing phase, D operates as the actual novelty detector, while R improves the performance of the detector by adequately reconstructing the positive or target samples and decimating (or distorting) any given negative or novelty samples.
In this post, we introduced Adversarially Learned One-Class Classifier with starts from a quick review of two closed related model structures, auto-encoder and GAN followed by the introduction to the main model architecture we are going to implement with Keras in the next post.
My Keras implementation is largely influenced by the author's paper https://arxiv.org/abs/1802.09088