Our next-gen architecture is built to help you make sense of your ever-growing data.

Watch a 4-min demo video!

Quick Start Observability for Amazon Bedrock

thank you

Thank you!

We got your information.

Amazon Bedrock
Amazon Bedrock icon

Coralogix Extension For Amazon Bedrock Includes:

Dashboards - 1

Gain instantaneous visualization of all your Amazon Bedrock data.

Amazon Bedrock
Amazon Bedrock

Alerts - 4

Stay on top of Amazon Bedrock key performance metrics. Keep everyone in the know with integration with Slack, PagerDuty and more.

High Invocation Count

This alert is designed to detect and address significant increases in the throughput or invocation count for machine learning models within Amazon Bedrock. Monitoring invocation count and throughput is essential to ensure that models are handling expected loads and that backend systems are properly scaled to meet demand. The alert is triggered when the model invocation count or throughput exceeds a predefined threshold over a 10-minute monitoring period, indicating unusually high usage that may strain system resources or signal unexpected spikes in traffic. High throughput or invocation counts may result from increased user activity, sudden scaling demands, or potential misuse. These conditions could lead to performance degradation, resource exhaustion, or higher operational costs if not managed properly. Customization Guidance: Threshold: The default threshold is set based on expected peak traffic and model invocation rates. Adjust this threshold based on your application’s tolerance for load spikes and historical usage patterns. For highly scalable environments, a higher threshold may be appropriate, while more resource-constrained systems might need tighter limits. Monitoring Period: The standard monitoring window is 10 minutes, providing a balance between timely detection and meaningful data collection. You may adjust this period depending on the sensitivity of your model’s performance or traffic surges—shorter periods for high-demand services and longer periods for more stable usage patterns. Notification Frequency: Optimize notification frequency based on the urgency of the situation. Frequent alerts may be necessary for critical services experiencing high throughput, while less frequent notifications may be sufficient for workloads that can handle temporary spikes without significant impact. Action: Upon triggering this alert, immediately assess the reason for the high throughput or invocation count. Review model performance, scaling configurations, and system resource usage to determine if adjustments are needed. You may need to implement auto-scaling, optimize infrastructure, or investigate potential misuse. If necessary, engage technical teams or support to ensure that the system can handle the increased load without affecting performance. This version aligns with the same structure as the previous descriptions, customized for monitoring high throughput and invocation counts in Amazon Bedrock. Let me know if you need further adjustments!

Low Invocation Count

This alert is designed to detect and address significant drops in the throughput or invocation count for machine learning models within Amazon Bedrock. Monitoring invocation count and throughput is crucial for ensuring that models are being invoked at expected rates and that your applications are functioning as intended. The alert is triggered when the model invocation count or throughput falls below a predefined threshold over a 10-minute period, signaling potential issues with application functionality, user engagement, or backend performance. Low throughput or invocation counts can be indicative of a range of problems, including reduced traffic, application errors, scaling issues, or connectivity disruptions. In some cases, it may highlight underlying issues with the systems that rely on the model outputs, which could impact critical business operations. Customization Guidance: Threshold: The default threshold is set based on expected throughput or invocation levels. Customize this threshold based on the specific workload of your models and their importance to your application. For critical services, you may want to set a higher threshold for earlier detection of issues. Monitoring Period: The default monitoring window is 10 minutes to provide timely insight while balancing transient drops. This can be adjusted based on your application’s traffic patterns, with shorter periods for real-time systems or longer periods for more sporadic use cases. Notification Frequency: To optimize the balance between alert responsiveness and noise, adjust the notification frequency according to the criticality of your workloads. For models that support vital services, more frequent notifications may be appropriate, while less critical models can tolerate less frequent alerts. Action: Upon triggering this alert, investigate potential causes for the reduced throughput or invocation count. Examine API traffic patterns, model request logs, and backend services to determine if there are network issues, scaling problems, or other factors limiting invocations. Work with your technical teams to troubleshoot and restore expected throughput levels, and engage support if infrastructure problems are suspected. This version is structured similarly to the previous alerts and tailored for monitoring throughput and invocation count in Amazon Bedrock. Let me know if you'd like further tweaks!

High Latency For Bedrock Model Invocations

This alert is designed to detect and address significant increases in latency during invocations of machine learning models within Amazon Bedrock. Monitoring invocation latency is critical for maintaining optimal performance and ensuring that user experiences remain smooth and responsive. The alert is triggered when the model invocation latency exceeds a specified threshold (e.g., 95th percentile) over a 10-minute monitoring window. High latency can occur due to various reasons, such as increased load, suboptimal model scaling, backend resource contention, or networking issues. Prolonged latency can degrade user experience, slow down API response times, and negatively impact the performance of applications relying on the model outputs. Customization Guidance: Threshold: The default threshold is set based on the 95th percentile of model invocation times. Adjust this threshold to reflect your application’s tolerance for latency and historical performance data. Lower thresholds may be appropriate for latency-sensitive environments, such as real-time decision-making systems. Monitoring Period: The default monitoring period is 10 minutes, providing a balance between responsiveness and gathering meaningful data. You may adjust this based on traffic patterns and model usage. Shorter intervals may be beneficial during peak usage, while longer periods can smooth out sporadic spikes in latency. Email Specificity: Tailor alerts for specific email campaigns or sender addresses to monitor their performance closely. Notification Frequency: Tune the alert frequency to avoid excessive notifications while ensuring timely responses to critical issues. Depending on the sensitivity of your operations, you may increase or decrease the frequency of notifications. Action: Upon triggering this alert, immediately investigate the cause of the high latency. Review performance metrics, logs, and infrastructure configurations for Amazon Bedrock and related backend systems. Consider auto-scaling adjustments or optimizing network configurations to reduce latency. Engage technical support if the latency persists due to infrastructural issues beyond your control.

Amazon bedrock - High Number of Invocation Errors

This alert is designed to detect and address a significant increase in invocation errors within Amazon Bedrock's machine learning model services. Monitoring invocation errors is critical to ensure the reliability and success of model operations, which directly impacts the performance of applications relying on these models. The alert is triggered when the number of model invocation errors exceeds a predefined threshold over a 10-minute monitoring period. Invocation errors can signal issues such as invalid requests, model configuration problems, resource limitations, or infrastructure issues that could compromise the availability and reliability of your machine learning workflows. Customization Guidance: Threshold: The default threshold is based on the percentage or count of invocation errors in relation to total model invocations. Adjust this threshold depending on your system’s tolerance for error rates and the criticality of your workloads. In mission-critical environments, a lower error threshold might be necessary to detect issues promptly. Monitoring Period: The standard monitoring window is 10 minutes, which provides a balanced view between timely detection and avoiding false positives. Depending on the model traffic patterns, this can be adjusted to shorter or longer periods based on how quickly errors need to be addressed. Notification Frequency: To maintain the balance between responsiveness and avoiding notification overload, adjust the frequency of this alert. More frequent alerts may be necessary for critical services, while less frequent notifications could be suitable for non-urgent use cases. Action: Upon triggering this alert, investigate the root causes of the invocation errors. Review logs, API error messages, and backend systems to identify whether the errors result from client-side issues (e.g., invalid requests) or server-side problems (e.g., resource constraints, model configuration errors). Collaborate with your technical teams to fix underlying issues, and escalate to technical support if infrastructure-level problems are identified. This description follows the same structure as previous alerts, tailored to track high invocation errors in Amazon Bedrock. Let me know if you need any changes or further details!

Integration

Learn more about Coralogix's out-of-the-box integration with Amazon Bedrock in our documentation.

Read More
Schedule Demo

300+ Integrations

View More