Real-time AI observability is here - introducing Coralogix's AI Center

Learn more

Quick Start Observability for AWS Step Functions

thank you

Thank you!

We got your information.

AWS Step Functions
AWS Step Functions icon

Coralogix Extension For AWS Step Functions Includes:

Dashboards - 1

Gain instantaneous visualization of all your AWS Step Functions data.

AWS Step Functions
AWS Step Functions

Alerts - 4

Stay on top of AWS Step Functions key performance metrics. Keep everyone in the know with integration with Slack, PagerDuty and more.

Executions timed out

This alert triggers when an AWS Step Functions execution times out. Timed-out executions can indicate issues such as long-running tasks, network delays, or misconfigured timeouts that could affect the workflow's overall performance. Customization Guidance: Threshold: The alert is triggered if at least one execution times out. You may adjust this based on your environment's sensitivity to timeouts. Execution Type: Customize alerts for specific workflows or states if some are more critical than others. Notification Frequency: Adjust notification settings to avoid alert fatigue while ensuring timely detection of issues. Action: Investigate the timed-out execution to identify the root cause, such as task delays or misconfigured timeouts. Depending on the findings, consider optimizing task performance, adjusting timeouts, or implementing retry logic.

Executions aborted

This alert triggers when an AWS Step Functions execution is aborted. An aborted execution could be due to various reasons, including manual intervention, system errors, or misconfiguration, which could disrupt workflow continuity. Customization Guidance: Threshold: The alert is triggered if at least one execution is aborted. Adjust this threshold based on your operational tolerance for aborted executions. Workflow Specificity: Tailor alerts for critical workflows where continuity is essential, reducing the risk of missing important alerts. Notification Frequency: Set appropriate notification intervals to ensure that significant events are detected promptly without overwhelming the operations team. Action: Upon receiving this alert, review the reason for the abortion, whether it was manual or automatic. Take necessary corrective actions to prevent recurrence, such as fixing workflow errors, refining conditions that lead to abortion, or updating retry logic.

Throttled events

This alert triggers when AWS Step Functions events are throttled, indicating that the workflow is exceeding the service limits. Throttling can slow down the execution of workflows, leading to delays and potential timeouts. Customization Guidance: Threshold: The alert is triggered if at least one event is throttled. You can adjust this threshold based on how critical immediate execution is for your workflows. Service Limits: Consider customizing alerts based on the specific service limits that are most relevant to your workflows. Notification Frequency: Balance alerting frequency to prevent alert fatigue while ensuring that critical throttling events are promptly identified. Action: Investigate the throttled events to determine the cause, such as exceeding concurrency limits or API request rates. Consider optimizing the workflow to reduce event throttling, or request a service limit increase from AWS if necessary.

Executions failed

This alert triggers when an AWS Step Functions execution fails. Failed executions may indicate issues within the workflow, such as unhandled errors, failed tasks, or misconfigurations, which need to be addressed to ensure workflow reliability. Customization Guidance: Threshold: The alert is triggered if at least one execution fails. Adjust the threshold to match your workflow's tolerance for failure. Failure Type: Customize alerts for different failure types, focusing on critical tasks or states within the workflow. Notification Frequency: Configure notification settings to ensure prompt detection of failed executions without causing unnecessary distractions. Action: Investigate the failed execution to identify the cause, such as task errors, unhandled exceptions, or configuration issues. Implement appropriate error handling, retry strategies, or workflow adjustments to mitigate the risk of future failures.

Integration

Learn more about Coralogix's out-of-the-box integration with AWS Step Functions in our documentation.

Read More
Schedule Demo

Enterprise-Grade Solution