One of the most popular applications of machine learning is anomaly detection. Outliers can be found and identified to help stop fraud, adversary assaults, and network intrusions that could jeopardize the future of your business.
This article will discuss how anomaly detection functions, the machine learning techniques that can be used, and the advantages of using machine learning for anomaly detection.
Table of Contents
Anomaly detection in machine learning
An anomaly, also known as a variation or an exception, is typically something that deviates from the norm. In the context of software engineering, an anomaly is an unusual occurrence or event that deviates from the norm and raises suspicion.
A software program must function smoothly and predictably. Thus any anomaly poses a possible risk to the robustness and security of the program. Normally, you want to catch them all. Anomaly or outlier identification is the process of detecting anomalies.
The identification of uncommon things, events, or observations that raise questions by deviating noticeably from the rest of the data is known as anomaly detection (also known as outlier detection). Atypical data is typically associated with some issue or unusual occurrence, such as for example, financial fraud, health issues, structural issues, broken equipment, etc. Given that recognizing these events is frequently extremely valuable from a business standpoint, it is very interesting to be able to identify which data points can be labeled anomalies because of this connection.
Why do we need anomaly detection in machine learning?
Whether the deviation is good or negative, anomaly identification is crucial because it helps you gain a deeper knowledge of changes in business performance. Even if the causes of anomalous data are unfavorable, it is still worthwhile to look into them. Business users can identify unauthorized transactions or security breaches by putting intrusion detection systems in place.
What does anomaly detection do in machine learning?
A significant component of the implemented machine learning is frequent anomaly detection. Whether identifying fraudulent behavior in the financial sector or keeping an eye on product quality, anomaly detection is a crucial component of machine learning systems in many different industries. Anomaly detection with machine learning typically encompasses a much wider variety of data than is achievable manually. Models can do anomaly detection that takes into consideration complicated characteristics and behaviors and complex features and behaviors. Models can then be taught to look for unusual behavior or trends.
Depending on the type of data, there are many model construction methodologies for anomaly detection in machine learning. Either labeled data or, more frequently, unlabeled raw data sets will be used to train models. Models that have been trained on labeled data will keep an eye out for outliers that go beyond the specified threshold for normal data. A model will classify the raw data into categories after being trained on unlabeled data, and it will also identify outliers that exist outside the clusters. In both situations, the model recognizes what falls inside a range of acceptable behavior and will spot unusual behavior or data.
Machine learning and anomaly detection: Types of outliers
Let’s explore the types of different anomalies in machine learning. These are the anomaly detection types:
- Global outliers
- Contextual outliers
- Collective outliers
A data point can be deemed a global anomaly if its value falls outside the bounds of all the other data points in the collection. It is, in other words, an unusual occurrence.
The analytics staff at the bank would be alarmed if, for instance, you consistently deposit an average American salary into your bank accounts but one month receives a million dollars.
When a contextual outlier is referred to, it signifies that its value doesn’t match what we would anticipate seeing for a comparable data point in the same context. The identical circumstance experienced in many contexts can occur since contexts are often temporal and are not always abnormal.
For instance, it’s common for shops to have an uptick in consumers around the holidays. On the other hand, a rapid uptick that occurs outside of holidays or sales can be viewed as a contextual outlier.
A subset of data points that depart from the norm indicates collective outliers.
Tech firms often continue to expand in size. While not a widespread trend, some businesses may decline. However, we can spot a collective outlier if a number of businesses simultaneously show a decline in sales over the same time period.
What are the characteristics of anomaly detection?
Let’s quickly go through some of the characteristics of the anomaly detection issue:
- Processing type
- Data type
- Modes by data labels
- Application domain
Both offline and online processing methods exist. One can obtain the best answer for the offline type since it is set when there is a complete collection of data. The online kind is chosen when data points arrive in batches (a subset of points) or all at once (real-time), and anomaly starts (changepoints) must be located as soon as they happen.
It is more convenient to think of the data as having been pre-processed and changed into ready-for-machine learning, even if it is frequently divided into structured, semi-structured, and unstructured kinds (details here). In this situation, as anomaly detection methods for distinct data kinds frequently differ greatly, data classification by modality is more beneficial.
- Tabular data: This data is organized into lines, each of which includes details on distinct items. The same number of columns (some values may be missed) are included in each row, and each column represents a property value for the object that the row is describing.
- Image data: It’s common for this to be a tensor or multidimensional array, where two dimensions (rows and columns) stand in for the x and y axes of space, and a third represents the brightness or grayscale of a pixel.
- Video data: It typically combines the types of audio and time series of images, each of which is an instance of the picture type.
- Time series data: This is a sequential observation of univariate or multivariate data through time. In a unique situation, data is observed at predetermined, evenly spaced intervals of time (such as yearly, monthly, quarterly, or hourly). Time series data is often a specific type of tabular data that frequently has an index in timestamp format.
- Text data: This is broken down into words for phrases, sentences, and texts, or it is combined.
- Audio data: When the sound is gathered sequentially, this is a unique case of time series data.
Modes by data labels
Modes can be classified as supervised, semisupervised, and unsupervised according to the labels assigned to the data. Each data point is assigned a normal or anomalous class by the data labels (or one of the anomaly classes). Labeled data points are required for both the normal and anomalous classes in supervised training mode.
Solving semi-supervised tasks is enabled by having markup for the normal or anomaly-free class and knowledge of it. The most popular techniques are unsupervised ones because they don’t need training data. These techniques frequently start with the premise that there are considerably fewer abnormal events than typical ones.
- Anomaly detection algorithm outputs: Mainly there are two types of anomaly detection algorithms:
- Scores: When the AD algorithm returns a level of abnormality for each data instance. It enables a flexible definition of the abnormality boundaries at the post-processing step.
- Labels: When the AD algorithm assigns a label or class (normal/anomalous) to each data instance.
Anomalies can be categorized into many sorts depending on the particular sector or application. Typically, different categories allude to the different sorts of anomaly occurrences and imply that different AD approaches and domain heuristics should be utilized. Anomalies from different domains can, however, be handled relatively similarly when it comes to mathematical difficulties. For instance, when dealing with sensor network anomalies, cyber-intrusions, and industrial defects, the same AD techniques are applied when the data is of the time series type.
What are the difficulties in anomaly detection?
We are aware that accurate anomaly identification requires a combination of ongoing statistical analysis and historical data. Importantly, the quality of the data and sample sizes employed in these models have a significant impact on the alerting as a whole. These are the biggest difficulties in anomaly detection:
- Data quality
- Training sample sizes
- False alerting
- Imbalanced distributions
One important question you can have is, “Which algorithm should I use when constructing an anomaly detection model?” The type of problem you’re trying to address will obviously have a big impact, but one thing to think about is the underlying data.
The most important factor in developing an accurate, useful model is going to be data quality or the caliber of the underlying dataset.
Training sample sizes
For various reasons, having a big training set is crucial. The algorithm won’t have enough exposure to prior examples if the training set is too short of creating an accurate model of the expected value at a particular moment. The model’s overall accuracy will therefore be impacted by the way anomalies distort the baseline.
Another frequent issue with limited sample sets is seasonality. Having a big enough sample dataset is crucial because no day or week is the same. Depending on the industry, customer traffic volumes over the Christmas season may increase or dramatically decrease. For the model to effectively generate and monitor the baseline throughout common holidays, it is crucial to observe data samples from numerous years.
In a dynamic context, spotting anomalies is a great tool since it can use historical data to distinguish between expected behavior and unusual occurrences. What transpires, though, if your model frequently produces false alarms and is incorrect?
It can be challenging to win over hesitant users’ trust and just as simple to lose it, so it’s crucial to strike a balance.
Using a classification technique to create a supervised model is an additional approach to developing an anomaly detection model. To understand what is good or bad, this supervised model needs data that have been labeled.
The imbalanced distribution of labeled data is a prevalent issue. Since having a nice condition is normal, 99% of the labeled data will be biased in favor of good. The training set could not have enough examples to learn and associate with the negative condition as a result of this natural imbalance.
Anomaly detection approaches in machine learning
This chapter gives a general overview of anomaly detection methods based on the types of data that are accessible, how to assess an anomaly detection model, how each method creates a model of typical behavior and the advantages of deep learning models. We finish up by talking about potential difficulties while using these models.
Based on the kind of data required to train the model, anomaly detection techniques can be divided into different categories. In the majority of use cases, a very tiny portion of the whole dataset is anticipated to be made up of anomalous samples. Therefore, normal data samples are easier to find than aberrant ones, even when labeled data is available. For the majority of applications today, this presumption is crucial. These are the anomaly detection approaches in machine learning:
- Supervised learning
- Unsupervised learning
- Semi-supervised learning
- Local outlier factor (LOF)
- K-nearest neighbors
- Bayesian networks
- Support vector machines (SVMs)
Machines learn a function that maps input features to outputs based on sample input-output pairings while they are learning under supervision. Adopting application-specific knowledge into the process of anomaly detection is the aim of supervised anomaly detection algorithms.
The challenge of anomaly detection can be reframed as a classification task with enough normal and anomalous instances so that computers can learn to correctly anticipate whether a particular example is an abnormality or not. However, for many anomaly detection use cases, the ratio of normal to abnormal instances is severely skewed; even while there may be several classes of anomalies, each one may be significantly underrepresented.
This method implies that the user can accurately classify all possible anomalies and has labeled examples for each kind. As abnormalities can manifest in a variety of ways and new anomalies can arise during testing, this is typically not the case in practice. Therefore, methods that generalize well and are better at spotting abnormalities that haven’t been seen before are preferred.
Machines cannot learn a function that translates input features to outputs using unsupervised machine learning because they lack examples of input-output pairings. Instead, they discover structure within the input features and use that to learn. Unsupervised methods are more widely used in the field of anomaly identification than supervised ones because, as was already said, labeled anomalous data is comparatively uncommon. However, the type of anomalies one expects to find is frequently very particular. As a result, many of the abnormalities discovered in an entirely unsupervised approach may simply be noise and may not be relevant to the task at hand.
Semi-supervised machine learning strategies use a variety of techniques that can benefit from both huge volumes of unlabeled data and sparsely labeled data, acting as a type of middle ground. Due to the abundance of normal instances from which to learn and the dearth of examples of the more unusual or abnormal classes of interest, many real-world anomaly detection use cases are well suited to semi-supervised machine learning. One can train a reliable model on an unlabeled dataset and assess its performance using a small quantity of labeled data on the presumption that the majority of the data points in an unlabeled dataset are normal.
Applications like network intrusion detection, where there may be several examples of the normal class and a few examples of intrusion classes, but new types of intrusions may develop over time, are ideally suited for this hybrid technique.
Consider X-ray screening for border or airport security as another illustration. Unusual products that pose a security danger are uncommon and can take many different shapes. Additionally, any anomaly that poses a potential hazard may change in nature as a result of a variety of outside events. Therefore, it may be challenging to get sufficient quantities of useful examples of anomaly data.
These circumstances might call for the identification of novel classes as well as anomalous classes, for which there may be few or no labeled data. A semi-supervised classification strategy that permits the detection of both known and previously unidentified abnormalities is the optimal response in these circumstances.
Local outlier factor (LOF)
The most popular method for anomaly identification is likely the local outlier factor. The idea of local density serves as the foundation for this method. It contrasts an object’s local density with the densities of the nearby data points. A data point is deemed an outlier if its density is lower than that of its neighbors.
A popular supervised machine learning approach for classification is kNN. KNN is a helpful tool when used to solve anomaly detection difficulties since it makes it simple to see the data points on the scatterplot and makes anomaly identification much more understandable. The fact that kNN performs well on both small and large datasets is an additional advantage.
In order to tackle the categorization problem, kNN doesn’t actually learn any ‘normal’ and ‘abnormal’ values. Therefore, kNN functions as an unsupervised machine learning method for anomaly detection. A range of normal and abnormal values is explicitly defined by a machine learning expert, and the algorithm automatically divides this representation into classes.
This approach uses unsupervised machine learning and is based on the density principle. By examining the local density of the data points, DBSCAN may find clusters in sizable spatial datasets and generally produces positive findings when used for anomaly identification. The points that are not a part of any cluster are given their own class: -1, making it simple to spot them. In situations when the data is represented by non-discrete data points, this technique manages outliers successfully.
This approach uses artificial neural networks to compress the data into lower dimensions in order to encode it. The data is then decoded by ANNs to recreate the initial input. The rules are already recognized in the compressed data, so when we lower the dimensionality, we don’t lose the necessary information.
Machine learning engineers may find anomalies even in high dimensional data thanks to Bayesian networks. When the anomalies we’re seeking are more subtle and challenging to spot and visualizing them on the plot might not yield the expected results, we employ this strategy.
Support vector machines (SVMs)
Another supervised machine learning approach that is frequently used for classification is the support vector machine (SVM). SVMs categorize data points using hyperplanes in multidimensional space. The threshold (%) for outliers that must be manually selected is the hyperparameter nu.
When there are multiple classes involved in the issue, SVM is typically used. However, it is also applied to single-class issues in anomaly detection. The model can determine whether unfamiliar data belongs to this class or is an anomaly because it has been trained to understand the “norm”.
Is anomaly detection supervised or unsupervised?
The unsupervised approach is the kind of anomaly detection that is most frequently used. There, using an unlabeled dataset, we build a machine learning model to fit to the typical behavior. We make the crucial assumption that the bulk of the training set’s data are typical examples throughout this process. But among them, there can be some odd data points (a small proportion). Any data point that considerably deviates from the expected behavior will then be marked as an anomaly. A classifier will be trained using a dataset that has been classified as “normal” and “abnormal” in supervised anomaly detection.
There will be a typical classification application when a new data point is introduced. Both of these approaches have advantages and disadvantages. A vast number of both positive and negative examples are needed for the supervised anomaly detection procedure. Due to the rarity of anomalous occurrences, it will be quite challenging to obtain such a dataset. Even if you were to acquire such a dataset, you would only be able to simulate the dataset’s anomalous patterns.
But there are many various kinds of anomalies in every field, and future anomalies might not resemble the ones we’ve already observed at all. Any algorithm will have a very difficult time learning what the anomalies look like from anomalous samples. The unsupervised method is well-liked for this reason. It is far simpler to record regular conduct than it is to record the numerous varieties of anomalies.
Which algorithm will you use for anomaly detection?
One of the best algorithms for detecting anomalies is a support vector machine. A supervised machine learning method called SVM is frequently applied to classification issues.
Is SVM used for anomaly detection?
Yes. SVMs use hyperplanes in multidimensional spaces to distinguish between different classes of observations. Naturally, SVM is used to address issues with multi-class classification.
SVM is, however, also being used more frequently in one-class problems when all of the data are from a single class. In this instance, the algorithm is trained to understand what is “normal” so that it can determine whether fresh data should be included in the group or not when it is presented. If not, the new data is classified as anomalous or out of the ordinary.
What is the difference between an anomaly and an outlier?
Observations that deviate significantly from the mean or center of distribution are known as outliers. They may or may not signify abnormal behavior or conduct brought on by an alternative procedure. Anomalies, on the other hand, are data patterns that are produced by various processes.
How do you identify outliers?
Extreme data points can be transformed into z scores that indicate how far they deviate from the mean.
A value can be categorized as an outlier if its z score is sufficiently high or low. Generally speaking, values with a z score of larger than 3 or lower than -3 are regarded as outliers.
Which algorithm is best for outliers?
Because it indicates the range of the middle half of your dataset, using the interquartile range (IQR) may be a good approach. Outliers are any numbers that fall outside of the “fences” you can draw with the IQR around your data.
Is anomaly detection a classification problem?
By now, it should be clear that classification and supervised anomaly detection are two different machine learning issues. If you have labeled classes and deciding whether or not the dataset is unbalanced are two main ways to distinguish between them.
Can we use a classification algorithm to detect outliers?
A classification or regression dataset with outliers can have a poor fit and perform less well in predictive modeling. Given the enormous number of input variables in the majority of machine learning datasets, it is difficult to detect and eliminate outliers using straightforward statistical methods.
Is anomaly detection classification or regression?
When a subset of these anomalous patterns is known in some application-specific situations, the OD problem can be transformed into a supervised one and frequently treated as a classification problem; this is the condition that is typically referred to as supervised OD. But even in this scenario, OD is frequently the first stage of a data modeling procedure that ends with a supervised classifier or regressor.
The speed of anomaly detection can be increased by using machine learning to learn a system’s properties from observed data. In addition to learning from the data, machine learning algorithms are also able to forecast the future based on that data. These algorithms can then refine their initial predictions by “learning” from how the events actually turn out.
Techniques that let you efficiently find and categorize anomalies in huge and intricate big data sets are included in machine learning for anomaly detection. Sequential hypothesis tests, including cumulative sum charts and sequential probability ratio tests, are examples of anomaly-detection methods. These tests can be used to identify changes in the distributions of real-time data and to set alarm parameters.