Since the beginning of the Internet, the speed of delivering content has been an issue. While processor enhancements, network acceleration, and web frameworks have brought drastic improvements to performance, the goalposts have continued to shift further away; devices operate on wireless connections with limited bandwidth, and the Internet is accessed from every corner of the globe.
As user tolerance for slow page loads and refresh rates continues to diminish, poor website performance can directly impact visitor counts, click-throughs, and conversions. This is where CDNs come in.
Content Delivery Networks (also known as Content Distribution Networks) play a vital role in providing fast, responsive web pages, making it possible to live stream events around the globe and enabling real-time interactivity for remote, multi-player online games. Using a CDN to speed up content delivery to your users can help your business remain competitive and gain access to markets around the world.
A CDN consists of a network of edge servers in various geographic locations, each of which caches content supplied by the server that hosts your website (known as the origin web server). When a user’s device requests your website, it is first routed to the nearest CDN server, which tries to service the request with the cached content.
If the requested content is unavailable or out of date, the CDN server proxies the request to the origin web server as the source of truth. A CDN can significantly reduce latency and improve performance by reducing the physical distance that most requests and responses need to travel.
CDN providers, such as Akamai, Fastly, CloudFlare, and AWS, maintain a network of edge servers around the globe, which you can use to cache your web content and improve the experience for worldwide audiences. Storing content across a network of edge servers improves your site’s resilience by providing redundancy and protecting against DDoS attacks. Reducing the number of requests to your origin server can also reduce bandwidth costs while enabling you to handle more traffic.
Your web server’s access logs can provide you with a wealth of information about the traffic hitting your site, such as user journeys and web crawler patterns, alerting you to errors, and helping diagnose issues. Collating and analyzing web access log data in real time is essential for proactively monitoring your website health and addressing issues.
When you augment your services with a CDN, many of the requests to your site are served directly by the local caches and are never seen by your origin web server. To maintain visibility of your web traffic, it’s essential to extend your web log analysis to include CDN server logs.
While most CDN providers offer an API to forward logs for storage and analysis, not all of them will store them for retrieval later if you don’t use that option. Let’s look at the most popular CDN providers:
● Akamai – With Akamai DataStream, you can send raw logs from multiple endpoints to the destination of your choice every 30 or 60 seconds. If you don’t want to collect all logs, you can configure DataStream to sample the data for particular delivery properties. Akamai also provides a Log Delivery Service API that can be used to deliver edge server logs to a given destination on a schedule.
● Fastly – Fastly offers several protocols to stream CDN logs in real-time, including both syslog and HTTPS, to multiple destinations. You can also change the log format and encrypt logs before sending them. These features help greatly with CDN log analysis.
● Cloudflare – Cloudflare allows you to push logs to specified cloud storage locations, including Amazon S3, or pull logs to any destination using their REST API every few minutes.
● Amazon CloudFront – Two options are available for log streaming with CloudFront. With standard logs, all log entries are forwarded to an S3 bucket from which you can analyze or export them. Alternatively, you can use real-time logs and specify a sampling rate to send logs to Amazon Kinesis Data Streams within seconds of generation.
By collating log entries from your CDN’s multiple edge servers and combining them with your origin web server logs for analysis, you can derive several benefits that help with CDN log analysis.
If you’re using a CDN, you’ll want to ensure it’s delivering the performance you require – both generally and when you’re expecting a peak in demand due to live streaming an event, launching a new product, or hosting an online game.
Using the data available in your web access logs, you can monitor the number of requests you’re receiving to each page and break those requests down by edge server location or region (using the requesting IP address). You can compare how users are experiencing your site worldwide and identify any performance reduction for particular pages or locations by looking at response times.
Monitoring the number of requests proxied by CDN servers to your origin web server will allow you to identify any unusual increase in cache misses, resulting in slower load times and increased demand on bandwidth. This might be an issue with the page itself or from misconfigured settings. When you need to make changes to your CDN configuration, you can use real-time log analysis to validate the new settings are working as expected and alert you to any unintended consequences.
Just as with your origin web server logs, CDN logs can give you insights into who is visiting your site, the journey they’re taking through it, and the pages from which they exit. With more users blocking website tracking code, web server logs can help you better understand your users. Armed with that information, you can work out which pages you should invest your time in and which markets may require more targeted efforts.
However, users are not the only ones making requests to your site; bots will crawl your web pages, triggering requests to their nearest CDN server. By collating and analyzing requests from both edge and origin servers, you can understand where your search engine crawl budget is being spent and identify opportunities to optimize your site for SEO.
Log data can also help you work out if you’re being crawled by content scrapers or attacked by malicious form fillers, in which case you might want to investigate options for blocking such traffic.
Web log analysis plays a vital role in detecting and investigating failures. If a CDN serves most of your website traffic, you will need to collate and monitor edge server requests and responses to spot problems early.
While you should expect to serve a certain number of 4xx or 5xx error codes – due to mistyped URLs, broken referral links, or unauthenticated users – any significant increase in error responses warrants further investigation. Being able to immediately drill into the individual log entries will allow you to get to the bottom of the issue – whether that’s simply a bad link or something more serious like an upstream service failing – and fix it much more quickly than if you first have to retrieve the log files from the different edge servers.
Likewise, log data can track usage trends and key business metrics, such as transaction completion rates. Any deviation from the average could be a sign of an issue with your site, at which point tapping into the log files to see where users have dropped off will help you zero in on the cause of the problem fast.
CDN logs play a vital role in helping you understand how your site is being used, identify issues, and improve performance. When working with large volumes of log data from multiple sources, having a central platform to collate, parse, and analyze log entries will save considerable time and effort and enable you to derive insights fast to address issues as they emerge.
With Coralogix, you can collate log data from multiple sources, parse entries programmatically and conduct real-time analysis to identify trends and detect anomalies automatically. Coralogix provides integrations with both Akamai Datastream and Fastly to enable real-time CDN log analysis.