Quick Start Observability for AWS CloudFront
Thank you!
We got your information.
Coralogix Extension For AWS CloudFront Includes:
Dashboards - 1
Gain instantaneous visualization of all your AWS CloudFront data.
Alerts - 7
Stay on top of AWS CloudFront key performance metrics. Keep everyone in the know with integration with Slack, PagerDuty and more.
High Origin Latency Detected
This alert monitors the latency of Amazon CloudFront distributions to ensure that the time taken to fetch content from the origin server remains within acceptable limits. The alert is triggered when the origin latency exceeds 500 milliseconds (ms) over a 10-minute period. High origin latency indicates that the time taken for CloudFront to retrieve content from the origin server is longer than expected. This can lead to slower content delivery, negatively impacting user experience. Prompt detection and resolution of high latency issues are crucial for maintaining the performance and reliability of your web applications and services. Customization Guidelines: Threshold: The default threshold is set to trigger the alert when origin latency exceeds 500ms. Adjust this threshold based on the performance requirements of your application and acceptable latency levels. Monitoring Period: Define a monitoring period that suits your operational needs. A shorter period (e.g., 5-10 minutes) provides near real-time insights, while a longer period (e.g., 30-60 minutes) can help identify persistent latency issues Region Specificity: Specify which regions to monitor if your services have regional dependencies or if certain regions are more critical to your operations. This ensures that alerts are relevant and actionable for those regions Notification Frequency: Balance the alert frequency to optimize responsiveness while minimizing noise. Adjust according to the criticality of uninterrupted operation Action: If this alert triggers, review CloudFront metrics to confirm the latency issue and identify any patterns or anomalies. Verify the health and performance of your origin servers. Ensure that the network configuration between CloudFront and the origin server is optimized for low latency. If the origin server is overloaded, consider scaling up resources or adding more servers to distribute the load. Implement effective caching strategies to reduce the frequency of requests to the origin server. Use longer cache lifetimes for static content and ensure that frequently accessed dynamic content is efficiently cached. Optimize database queries and indexes to reduce response times if the origin server relies on database interactions. Ensure that the routing between CloudFront and the origin server is optimized. Use AWS Global Accelerator or other network optimization services to improve latency. Distribute content across multiple origin servers in different geographic locations to reduce latency for users in various regions. Implement auto-scaling policies for your origin servers to automatically adjust resources based on demand and reduce latency during peak periods
Amazon CloudFront - Less Than Usual Number of Requests
This alert aims to ensure that the number of requests through Amazon CloudFront remain consistent across different regions. It is triggered when the request volume in any specified region drops below the trend established by the anomaly-based algorithm over the past 10 minutes. Monitoring request volume is crucial for maintaining the health and operational effectiveness of services delivered through AWS CloudFront. A significant drop in requests may indicate potential issues such as service disruptions, configuration errors, or network problems, all of which can negatively impact user experience. Customization Guidance: Threshold: The anomaly-based algorithm dynamically adjusts the threshold based on historical data and trends Monitoring Period: The default monitoring period is 10 minutes, providing near real-time insights into request volumes. Adjust this period based on operational needs and expected traffic patterns to capture immediate issues or analyze longer-term trends. Region Specificity: Specify which regions to monitor closely based on the location of your key consumer bases or critical operations Notification Frequency: Balance the alert frequency to optimize responsiveness while minimizing noise. Adjust according to the criticality of uninterrupted operation Action: Upon triggering this alert, investigate the cause of the reduced request volume immediately. Check for reported outages, configuration errors, or unusual traffic spikes. Coordinating with AWS support may be necessary if the issue is infrastructure-related.
High 4xx Error Rate
This alert monitors the error rates of Amazon CloudFront distributions to ensure that the percentage of client-side errors (4xx errors) remains within acceptable limits. The alert is triggered when the 4xx error rate exceeds 5% over a 10-minute period. High 4xx error rates indicate issues such as incorrect URLs, client-side misconfigurations, unauthorized access attempts, or other client-related problems. These errors can degrade the user experience and signal potential issues with your web applications or APIs. Prompt detection and resolution of these errors are crucial for maintaining service reliability and user satisfaction. Customization Guidelines: Threshold: The default threshold is set to trigger the alert when the 4xx error rate exceeds 5%. This can be adjusted based on your tolerance for errors and the criticality of your services Monitoring Period: The standard monitoring period is 10 minutes. Adjust this period to be shorter for more immediate detection or longer for trend analysis, depending on your operational requirements Region Specificity: Define specific regions to monitor if your services have regional dependencies or if certain regions are more critical to your operations. This ensures that alerts are relevant and actionable. Notification Frequency: Balance the alert frequency to optimize responsiveness while minimizing noise. Adjust according to the criticality of uninterrupted operation Action: If this alert triggers, examine CloudFront logs to identify patterns or specific requests causing the errors. Investigate application logs to determine if the errors are being caused by issues within your application logic. Assess whether the client requests are properly formed and if they are being made to valid endpoints. Ensure that all URLs and endpoints are correctly configured and that there are no broken links or misconfigured routes. Verify that permissions and access controls are set correctly to prevent unauthorized access attempts that result in 403 errors. Implement proper request validation and error handling within your application to manage and respond to bad requests more effectively. If the high 4xx error rate is due to abusive or excessive requests, consider implementing rate limiting to protect your resources.
High 5xx Error Rate
This alert monitors the error rates of Amazon CloudFront distributions to ensure that the percentage of server-side errors (5xx errors) remains within acceptable limits. The alert is triggered when the 5xx error rate exceeds 5% over a 10-minute period. High 5xx error rates indicate issues with the origin server, CloudFront itself, or the network. These errors can significantly impact the performance and availability of your web applications, leading to a poor user experience. Prompt detection and resolution of these errors are crucial for maintaining service reliability. Customization Guidelines: Threshold: The default threshold is set to trigger the alert when the 5xx error rate exceeds 5%. Adjust this threshold based on the criticality of your CDN uptime and the historical error rates observed. Lower thresholds may be suitable for high-availability environments Monitoring Period: The standard monitoring period is 10 minutes. Adjust this period to be shorter for more immediate detection or longer for trend analysis, depending on your operational requirements Region Specificity: Define specific regions to monitor if your services have regional dependencies or if certain regions are more critical to your operations. This ensures that alerts are relevant and actionable Notification Frequency: Balance the alert frequency to optimize responsiveness while minimizing noise. Adjust according to the criticality of uninterrupted operation Action: If this alert triggers, review CloudFront logs to identify patterns or specific requests causing the errors. Verify the health and performance of your origin servers. Ensure that the CloudFront distribution settings are correctly configured, including the connection settings and caching behaviours. If the origin server is overloaded, consider scaling up the resources or distributing the load across additional servers. If configuration issues are identified, deploy the necessary fixes to resolve the errors. Ensure that all software components, including web servers and application servers, are up to date and free from known issues.
Low Cache Hit Rate Detected
This alert monitors the cache hit rate of Amazon CloudFront distributions to ensure that a high percentage of requests are being served from CloudFront’s cache rather than being forwarded to the origin server. The alert is triggered when the cache hit rate falls below 80% over a 10-minute period, indicating that a significant portion of requests are bypassing the cache. A low cache hit rate suggests that a high number of requests are being sent to the origin server, which can lead to increased latency, higher origin server load, and potentially higher costs. Maintaining a high cache hit rate is crucial for optimizing content delivery performance, reducing origin load, and improving user experience. Customization Guidelines: Threshold: Set an appropriate threshold for the cache hit rate based on your application’s performance requirements. For example, you might set the alert to trigger when the cache hit rate falls below 80%. Monitoring Period: Define a monitoring period that suits your operational needs. A shorter period (e.g., 5-10 minutes) provides near real-time insights, while a longer period (e.g., 30-60 minutes) can help identify persistent cache hit rate issues. Region Specificity: Specify which regions to monitor if your services have regional dependencies or if certain regions are more critical to your operations. This ensures that alerts are relevant and actionable for those regions. Notification Frequency: Balance the alert frequency to optimize responsiveness while minimizing noise. Adjust according to the criticality of uninterrupted operation Actions: If this alert triggers, examine CloudFront cache statistics to confirm the low cache hit rate and identify any patterns or anomalies. Look for unusual traffic patterns that may be causing cache misses, such as a high number of unique requests or cache-busting techniques. Verify the cache behavior settings in your CloudFront distribution, including cache expiration times (TTL), query string handling, and headers. Increase the TTL (time-to-live) settings for frequently accessed content to keep it in the cache longer. Ensure that cache-control headers are properly configured to align with your caching strategy. Configure CloudFront to ignore or whitelist query string parameters that do not affect the content’s response to increase cache hits. Minimize the number of headers used to vary the cache to increase the likelihood of cache hits. Use only necessary headers for content differentiation. Ensure that all static content (images, CSS, JavaScript, etc.) is properly cached. Use versioning for static files to manage updates without affecting cacheability. Where possible, cache dynamic content using techniques like signed URLs, cookies, or personalized cache policies to improve cache efficiency. Automate cache invalidation processes for content updates to ensure that only the necessary content is refreshed, maintaining a high cache hit rate for the rest of the content.
High Function Compute Utilisation
This alert monitors the utilization of compute resources allocated for AWS Lambda@Edge functions associated with Amazon CloudFront distributions. The alert is triggered when the compute utilization for these functions exceeds 90% over a 10-minute period. High compute utilization indicates that the Lambda@Edge functions are consuming a significant portion of the allocated compute resources. This can lead to performance degradation, increased latency, and potential throttling of requests. Prompt detection and resolution of high compute utilization are crucial for maintaining the performance and reliability of your edge applications and services. Customization Guidelines: Threshold: The default threshold is set to trigger the alert when compute utilization exceeds 90%. Adjust this threshold based on your application’s performance requirements and acceptable utilization levels Monitoring Period: Define a monitoring period that suits your operational needs. A shorter period (e.g., 5-10 minutes) provides near real-time insights, while a longer period (e.g., 30-60 minutes) can help identify persistent high utilization issues Region Specificity: Specify which regions to monitor if your services have regional dependencies or if certain regions are more critical to your operations. This ensures that alerts are relevant and actionable for those regions Notification Frequency: Balance the alert frequency to optimize responsiveness while minimizing noise. Adjust according to the criticality of uninterrupted operation Actions: If this alert triggers, examine Lambda@Edge function metrics to confirm the high compute utilization and identify any patterns or anomalies. Review the function code to identify inefficient code paths, resource-intensive operations, or potential memory leaks. Verify the configuration settings for the Lambda@Edge functions, including memory allocation, timeout settings, and concurrency limits. Refactor the function code to optimize performance. This includes optimizing loops, reducing the complexity of algorithms, and minimizing external API calls. Ensure that resources (e.g., memory, file handles) are efficiently managed and properly released after use. Minimize the size of the payload being processed by the function to reduce the amount of compute resources required. Increase the memory allocation for the Lambda@Edge functions if they are constrained by memory. More memory can improve performance but also increase costs. Adjust the timeout settings to ensure that functions complete within a reasonable time frame. Avoid setting excessively long timeouts. Set appropriate concurrency limits to control the number of function instances running simultaneously and prevent overloading the compute resources. Implement auto-scaling policies for your Lambda@Edge functions to automatically adjust resources based on demand and reduce the risk of high utilization during peak periods.
High Execution Errors Count
This alert monitors the number of execution errors occurring in AWS Lambda@Edge functions associated with Amazon CloudFront distributions. The alert is triggered when the execution error count exceeds 10 within a 10-minute period. High execution error counts indicate that there are issues with the Lambda@Edge functions, such as code errors, configuration problems, or unexpected input. These errors can lead to degraded performance, increased latency, and poor user experience. Prompt detection and resolution of these errors are crucial for maintaining the performance and reliability of your edge applications and services. Customization Guidelines: Threshold: The default threshold is set to trigger the alert when the execution error count exceeds 10. Adjust this threshold based on your application’s tolerance for errors and acceptable performance levels. Monitoring Period: Define a monitoring period that suits your operational needs. A shorter period (e.g., 5-10 minutes) provides near real-time insights, while a longer period (e.g., 30-60 minutes) can help identify persistent error issues. Region Specificity: Specify which regions to monitor if your services have regional dependencies or if certain regions are more critical to your operations. This ensures that alerts are relevant and actionable for those regions Notification Frequency: Balance the alert frequency to optimize responsiveness while minimizing noise. Adjust according to the criticality of uninterrupted operation Actions: If this alert triggers, examine the logs for the Lambda@Edge functions to identify the specific errors occurring. Look for common error messages and patterns. Review the function code to identify the source of the errors. Pay special attention to error-prone sections of the code, such as input validation, external API calls, and resource handling. Verify the configuration settings for the Lambda@Edge functions, including environment variables, IAM roles, and permissions. Implement comprehensive error handling in the function code to manage expected errors gracefully and log unexpected errors for further investigation. Ensure that all inputs are properly validated to prevent errors caused by unexpected or malformed data. Implement retry logic for transient errors, such as network timeouts or temporary service unavailability, to improve resilience. Ensure that environment variables are correctly configured and accessible to the Lambda@Edge functions. Verify that the IAM roles and permissions assigned to the Lambda@Edge functions are sufficient for their operations and do not inadvertently cause execution errors. Develop and run comprehensive unit and integration tests to identify potential issues before deployment. Use canary deployments to roll out changes gradually and monitor for errors before fully deploying to production. Implement automated rollback mechanisms to revert changes if a significant increase in errors is detected after a deployment.
Integration
Learn more about Coralogix's out-of-the-box integration with AWS CloudFront in our documentation.