Semi-Supervised Anomaly Detection Algorithm Using Probabilistic Labeling (SAD-PL)
Lee, K., Lee, C. H., & Lee, J. (2021). Semi-supervised anomaly detection algorithm using probabilistic labeling (SAD-PL). IEEE Access, 9, 142972-142981.
To detect abnormal data via semi-supervised learning, unlabeled data are generally assumed to be normal data. This assumption, however, causes inevitable performance degradation when a small fraction of abnormal data is included in the unlabeled dataset. To overcome the degradation and to maintain stable detection performance, we propose a semi-supervised anomaly detection algorithm using probabilistic labeling (SAD-PL) for unlabeled data. The proposed SAD-PL is composed of two steps: (1) estimating local outlier factor (LOF) scores of latent vectors from both labeled and unlabeled data and (2) estimating labeling probability on the unlabeled data by using the prior missing probability of the labeled data via the Neyman-Pearson (NP) criterion. The SAD-PL runs iteratively by using the proposed complementary learning functions until the rate of label changes is lower than the predefined threshold. Experimental results reveal that the SAD-PL shows superior detection probability over the existing algorithms and stable performance regardless of the normal to abnormal data ratio in unlabeled data and the ratio of change variation of unlabeled data statistics to labeled data statistics.