Abstract: |
The growth of data dependant systems in real world applications such as smart buildings, factories and data centres are generating large quantities of data, this will only increase as the prevalence of the Internet of Things (IoT) systems become more ubiquitous. Many of these systems are critical to businesses maintaining their competitive advantage. Identifying significant anomalies in this data is a key enabler of this this competitive advantage.
Anomaly detection can be defined as the identification of items or events that do not conform to an expected pattern. Such anomalies can usually be explained as defects, errors or frauds. By their nature anomalies are rare and difficult to identify. The goal is to differentiate normal items from abnormal items, the datasets used are highly imbalanced towards one class (normal), due to anomalies (abnormal) being uncommon. Therefore, anomaly detection is typically an imbalanced classification problem. Dealing with imbalance is a fundamental challenge of supervised anomaly detection, which relies on obtaining positive and anomaly data for training.
To overcome these existing challenges, we propose a hybrid framework consisting of three stages, (1) Pre-training, (2) Model Training (3) Anomaly detection. The pre-training is approached as a Positive and Unlabelled (PU) learning problem, where only a minimal quantity of labelled positive data and unlabelled (mixture of normal and anomaly) data samples are available. Generative Adversarial Networks (GANs) are used in the training stage to capture the distribution of the normal (non-anomalous) data. Finally, the anomaly detector uses the trained GAN discriminator from stage two to detect anomalies.
We have evaluated our framework using both synthetic and real-world data to test the robustness of the framework in imbalanced anomaly detection problems. Preliminary experimental results show that our method performs well when compared to other anomaly detection techniques. |