What is an anomaly?
An anomaly is something that deviates from the norm, something unusual or unexpected. At Kollective, we refer to an anomaly as anything "weird" or out of the ordinary. The specific nature of an anomaly can vary depending on the metric in question, and this ambiguity is where things get interesting.
To help our customers identify anomalies, we focus on three key metrics: status codes, latency, and bytes received. For instance, an anomaly could be a surge in 404 errors (status codes), unusually high transfer times (latency), or a location receiving more bytes than expected for its network (bytes received).
There are various methods for detecting anomalies in data. Below, we’ll explore some of the most common approaches and how they can be applied to our metrics.
How Can We Detect an Anomaly?
We employ both statistical methods and machine learning techniques to identify anomalies in data.
Statistical Methods
One of the simplest and most common methods for anomaly detection is the use of thresholds. A threshold is a predefined value, and anything that exceeds this value is considered an anomaly. For example, when monitoring status codes, if an event logs even a small number of 403 errors where none are expected, this would exceed the threshold and trigger an alert, indicating a potential authentication issue.
Determining the appropriate threshold is crucial. One common approach is to use a Z-Score, which measures how far a value deviates from the norm. Values that deviate significantly from the mean are flagged as anomalous. For a metric that ranges over several orders of magnitude, we analyze the Z-Score of the logarithm of that metric instead. In this case, a value would be considered anomalous if it deviates significantly from the geometric mean.
Machine Learning Techniques
Machine learning (ML) offers a more sophisticated approach to anomaly detection by analyzing large datasets to identify patterns and establish what constitutes normal behavior for a given metric. ML is particularly effective in handling complex, noisy data—such as bytes received—where simpler methods might struggle.
By learning from the data, ML models can detect deviations from expected patterns, flagging these as anomalies. This approach allows for more accurate detection, especially in scenarios where traditional statistical methods may fall short.
How Will I Know About Anomalies in My Data?
This is where notifications come into play. We understand that you’re busy and may not have time to continuously monitor all your locations for anomalies. That’s why we handle it for you and send notifications when we detect an anomaly. Here’s how you can sign up for notifications and customize them to your needs.
To start receiving notifications, you’ll need to opt in via the Admin Portal. By default, notifications are turned off.
Once you’ve opted in, you can customize which notifications you want to receive in the Customer Portal. Here, the default is to receive all notifications, but you can toggle off any that are not relevant to you.
I’ve Opted Into Notifications! Now What?
Once you receive a notification, you can start troubleshooting to determine if any action is needed.
A good first step is to use the Observability Dashboard to identify the source of the anomaly—whether it’s linked to a specific app, location, or event. For long-term issues, like persistently high transfer latency in a specific location, you may notice multiple high-value data points on the latency visualization. This could suggest a network issue in that location, possibly requiring a configuration adjustment or the addition of EdgeCache.
If the anomaly is event-specific, it may appear as a single spike. From here, you can drill down into the Event History table, which highlights events within the anomaly's timeframe.
This allows you to pinpoint the event and explore further using Network Analytics or Location Performance. Location Performance is particularly useful if the anomaly is tied to a specific location and you need more detailed insights.
Conclusion
In summary, anomalies are deviations from the norm that can signal something unusual or unexpected within your data. At Kollective, we help our customers identify these "weird" occurrences by focusing on key metrics such as status codes, latency, and bytes received. By employing both statistical methods and machine learning techniques, we can accurately detect anomalies, even in complex and noisy datasets. Our notification system ensures that you are promptly alerted to any anomalies, allowing you to take appropriate action. Whether you're dealing with long-term trends or isolated incidents, our tools and insights will help you maintain optimal network performance.
Discover How Kollective Observability Can Benefit Your Business
Schedule a demo of Kollective's Video Assurance Platform today to experience firsthand how you can improve the experience of every user in your organization.