AWS Elemental MediaTailor – measuring transcoding performance with Coralogix
AWS Elemental MediaTailor provides a wealth of information via metrics, but one key feature that is very difficult to track is the Transcoding performance. What is…
Whether you are just starting your observability journey or already are an expert, our courses will help advance your knowledge and practical skills.
Expert insight, best practices and information on everything related to Observability issues, trends and solutions.
Explore our guides on a broad range of observability related topics.
tldr: This post discusses how to measure CDN request locality without indexing a single log.
The function of a CDN is to bring cacheable data, like pictures & videos, physically closer to your customers. This reduces latencies, and load on your backend servers. When measuring CDNs, there are some very common metrics:
These are important metrics, but they don’t test one fundamental component of CDNs, and indeed Edge computing in general. Is the edge really all that close to the customer? In short, how do you measure request locality?
CDNs are utilized to help organizations manage scale. High volume traffic, however, drives high volume telemetry. This complicates matters, and while CDNs offer dashboards within their UI, they can not easily be correlated with other data, like RUM or application telemetry.
Coralogix has a set of features that tackle the problem of deep CDN monitoring directly, so let’s walk through the steps I took to build a suite of detailed metrics, including request locality (requests that were made from a user to a CDN node within the same country). We have:
Our goal is to avoid indexing any data. In Coralogix, this means using the TCO Optimizer to change the data use case. We intend to perform analysis of this data, and archive it in S3, which makes it a perfect fit for the Monitoring use case. Even when the data is archived, it can be directly queried, at will, using Coralogix Remote Query.
We’re interested in the location of the client & the edge node. This information is not natively supplied by Akamai, so instead, we can use Coralogix Geo-location enrichment to add these fields to our log. The two IP fields that we want to enrich are: cliIP and edgeIP. So how do we enrich our logs?
It’s as simple as declaring the fields, and let Coralogix do the rest. This will add a new object to our Akamai logs that contains a wealth of locational information:
edgeIP_geoip:{
ip:180.94.11.246
ip_ipaddr:180.94.11.246
location_geopoint:{
lat:-6.1728
lon:106.8272
}
continent_name:Asia
country_name:Indonesia
city_name:Jakarta
postal_code:null
}
Note: Coralogix bills only by GB volume, so the enriching service costs nothing to use.
Logs that are being processed in the monitoring use case can be queried from a dashboard, train machine learning models, trigger alarms and much more. They can also be used to generate metrics. Converting logs into metrics is an incredible cost optimization method. Metrics can be held for a very long time, while maintaining high performance queries. This is a key step in observability without indexing.
We’ll create an Events2Metric group called Akamai_Edge_Logs_Locality and we’ll query for the appropriate metric. Creating Events2Metrics in Coralogix is easy. Simply specify the logs you want, and the fields you wish to track.
This will generate a metric for us, that we can query as often very quickly, and use PromQL to perform in-depth analysis.
Metric labels can be derived directly from values in the logs. In this case, we want two values, both of which came from our geo enrichment: edgeIP_geoip.country_name and cliIP_geoip.country_name. We will add these as the following:
Configuring this is completely straightforward, and requires no code at all.
We’ve now got everything we need to perform our analysis, which is simple.
There is a small challenge. PromQL does not natively support comparing two labels like this, so we’ve got to do something interesting with label rewriting. Without too much preamble, the function looks like this:
sum(
akamai_edge_logs_locality_cx_docs_total
and
label_replace(akamai_edge_logs_locality_cx_docs_total, "origin_country", "$1", "edge_country", "(.*)")
) / sum(akamai_edge_logs_locality_cx_docs_total)
Let’s break down each step in this query:
This gives us a value between 0 and 1 indicating a percentage rate of how many requests have been served by nodes in the same country as the requester. So what can we do now?
We can choose to render this information in a dozen different ways, but the easiest way is to select a gauge widget. This widget type will display a single value. There are some key configurations we will put in place for this widget:
The result is a wonderful, color coded tile!
Oh dear, 19.04%! That’s terrible. Less than 1/5th of our requests are being handled within the same country. This may have serious latency implications for our users. We’d better investigate! But how do we make sure we know if this happens again?
Our line graph has a 3 dot menu on it. By selecting this menu, some hidden features pop up.
One of them is the ability to create an alert directly from this alarm. By selecting this, a UI appears over your graph with a threshold.
We can now easily define an alarm, set priorities, set more complex alerting conditions and more, so that we find out when the edge of our CDN isn’t as sharp as we’d like it to be!
Without indexing, logs are typically only available if they’re rehydrated, but Coralogix does not depend on indexing, even for queries. By switching to logs explore, and selecting All Logs mode, customers can directly query the logs, held in the cloud storage in their own account, for no additional cost per query. This allows for deeper exploration of data, and unprecedented insight generation with absolutely no cost implications.
Throughout this tutorial, we’ve used a number of Coralogix features to solve a complex and nuanced observability problem. We have:
Now if you don’t mind, my edge locality just dropped below 20% and I have to know why!
AWS Elemental MediaTailor provides a wealth of information via metrics, but one key feature that is very difficult to track is the Transcoding performance. What is…
Metrics are key to monitoring system health and performance but you probably are ingesting far more metrics than you will ever need or use. The issue…
There are many solutions on the market that are promising insight into the four key metrics. Alas, these solutions often come with a significant price tag….