Troubleshoot alerting

Open in ChatGPT Open in Claude

Diagnose and resolve common alerting issues. This guide covers the most frequent problems reported by customers, with steps to identify and fix each one.

Alert not triggering

An alert definition exists and is active, but it never fires.

Check these first:

Verify the alert status is Active in Alerts, then Alert Management. Disabled or snoozed alerts do not evaluate.
Open the alert and review the Query step. Run the same query in Explore to confirm it returns matching data in the expected time range.
Check the Conditions step. Verify the threshold, evaluation window, and aggregation match what you expect. A common mistake is setting "more than 10 occurrences in 1 minute" when the actual volume is lower.
Check Application and Subsystem filters. If the alert filters by a specific application or subsystem, confirm data is flowing with those exact values.
For anomaly detection alerts, verify the data requirements are met. See Anomaly detection not working below.

If the query matches but the alert still doesn't fire:

Check whether a suppression rule is active that matches this alert.
Check whether the alert was recently edited. Some edits (such as changing the query) restart the evaluation cycle.
Use the Alert drill-down to review the evaluation history and confirm whether the condition was met.

Notification delayed or not received

The alert triggers but the notification arrives late or not at all.

Check these first:

Open the alert definition and go to the Notification step. Verify that Enable notifications is toggled on.
Check the Notify on setting. If set to Alerts: Signal-Based, notifications are sent for Triggered and Resolved events only. If set to Cases: Incident Management, notifications follow Case lifecycle events.
In Routing labels, verify the labels match an existing router in Notification Center. Select the share icon next to the labels to see Matching routers. If no routers match, notifications are not delivered.
Check the router's routing rules. Each rule has triggers, conditions, and destinations. If the condition evaluates to false, the notification is skipped.

Common causes of delayed notifications:

Retriggering period: If the alert has a retriggering period (for example, 24 hours), notifications are not sent again until the period elapses, even if the alert condition remains active.
Notification Center processing: There can be a delay of up to 30 seconds between alert evaluation and notification delivery. This is expected behavior.
Webhook destination errors: Check the notification.deliveries dataset for delivery failures. Go to Explore, select the system/notification.deliveries dataset, and filter by the alert name.

Terraform and API configuration pitfall:

If the alert was created via Terraform or the API, the notifyOn field may be set to triggered_only, which means resolved notifications are never sent. This setting is not always visible in the UI. Check your Terraform definition or API payload to verify.

False positive alerts

The alert triggers but the condition does not appear to be met when you investigate.

Common causes:

Data drift: Metric values can change between when the alert evaluated and when you check. Coralogix alerts evaluate on the streaming pipeline, so the data at evaluation time may differ from what appears in Explore minutes later. Use custom evaluation delay to account for late-arriving data.
"Less than" conditions with no data: An alert with a "less than 1 log in the last 2 hours" condition triggers when there is no data at all, because 0 is less than 1. If this is not the desired behavior, add an explicit no-data handling rule. See No-data handling.
Lookback window mismatch: The alert evaluation window and the time range you use in Explore may not align. Verify you are checking the exact window the alert evaluated.
Metric aggregation differences: PromQL alerts in Coralogix may produce different values than the same query in Grafana due to step interval, aggregation method, or data freshness differences. When debugging, use the View query action on the alert to see the exact PromQL expression and compare it in both systems.

Anomaly detection not working

The anomaly detection alert is not evaluating, not triggering, or showing unexpected behavior.

Data requirements:

The model requires 7 days of metric history with at least 90% data coverage in that window.
If the metric already has 7+ days of history when you create the alert, the alert becomes active within approximately 24 hours after the next daily model build.
If the metric has less than 7 days of history, the alert remains inactive until enough history accumulates.

Changes that restart the 7-day learning period:

Creating a new anomaly detection alert
Changing the metric query, filter, or PromQL expression
Changing core condition logic that defines the time series being modeled

Changes that do not restart the learning period:

Changing deviation percentage or sensitivity
Changing notification settings, labels, or suppression rules
Changing the alert name or priority

Other common issues:

500-permutation limit: The system supports a maximum of 500 permutations per metric for anomaly detection. If your group-by produces more than 500 unique combinations, some are not evaluated.
Evaluation history not available: The evaluation history UI does not currently support anomaly alert types. Use the Alert drill-down to review alert state changes.

For more details, see Anomaly detection alerts.

Suppression rules not working

A suppression rule is configured but alerts are still sending notifications.

Check these first:

Verify the suppression rule is enabled in Alerts, then Suppression Rules.
Check the time window. Suppression rules only apply during the configured schedule. If the current time is outside the window, the rule is not active.
Check the matching criteria. The rule must match the alert definition by name, labels, or other attributes. A mismatch means the rule does not apply.
Check whether the alert uses Notification Center routing. Suppression rules apply to the alert evaluation, not to the notification delivery. If the alert is suppressed, no notification is generated.

Important behavior:

Suppression disables notifications, not alert evaluation. The alert still evaluates, but no notification is sent while suppressed.
Suppressed alerts do not create Cases. If you see a Case for a suppressed alert, the Case was created before suppression was applied.

For more details, see Alert suppression rules.

Retriggering period confusion

Alerts are not re-notifying or incidents are not appearing in the expected time range.

How retriggering works:

The retriggering period controls how often notifications are sent for an ongoing alert condition. If set to 24 hours, the alert fires once and does not send another notification until 24 hours have passed, even if the condition remains active.

Common issues:

Incident not visible in the Incidents page: The Incidents page filters based on the retriggering period. If the alert has a 24-hour retriggering period and the last event was 12 hours ago, the incident may not appear in the current time range. This is expected behavior. Cases do not have this limitation and show all active issues regardless of retriggering period.
Alert not re-notifying: If the retriggering period has not elapsed, no new notification is sent. Shorten the retriggering period if you need more frequent notifications.
Debounce behavior: The retriggering period acts as a debounce mechanism to reduce notification noise. If the alert condition clears and re-triggers within the period, no new notification is sent.

Group-by permutation issues

Alerts with group-by fields produce unexpected results or miss some groups.

Common causes:

Too many permutations: Each unique combination of group-by values creates a separate evaluation. If the number of permutations exceeds system limits, some groups are not evaluated.
Ownership tag propagation delay: If your group-by uses ownership tags (service, team, environment), these tags take 2 to 5 minutes to propagate from parent resources. During this window, alerts may not match the expected groups.
Missing group-by values: If a resource does not have a value for a group-by field, it is excluded from the evaluation.

Webhook and notification destination issues

Notifications are configured but not arriving at the external destination (Slack, PagerDuty, email, webhook).

Check these first:

Verify the connector is configured and active in Notification Center, then Connectors.
Send a test notification from the connector configuration page to verify connectivity.
Check the notification.deliveries dataset for delivery errors. Go to Explore, set the dataset to system/notification.deliveries, and filter by connector type or alert name.
For webhook destinations, verify the endpoint URL is correct, authentication credentials are valid, and the endpoint is accepting requests.
For Slack, verify the workspace integration is connected and the channel name is correct.
For PagerDuty, verify the service key (Integration Key) is correct.

Common causes:

Expired credentials: API tokens or integration keys may have expired or been rotated.
Rate limiting: The external destination may be rate-limiting requests from Coralogix.
Payload format mismatch: The preset message format may not match what the destination expects. Check the preset configuration and the destination's API documentation.

Alert created via API or Terraform does not behave as expected

Alerts created programmatically may have settings that differ from UI-created alerts.

Common issues:

notifyOn set to triggered_only: Resolved notifications are never sent. Update the Terraform definition or API payload to include resolved events.
Missing routing labels: Alerts created via API may not have routing labels set, so no router matches and no notification is delivered. Add routing labels to the alert definition.
Data sources not set: If data_sources is omitted in the API payload, the alert evaluates against default/logs. If your data is in a different dataset, set the data_sources field explicitly.
Alert type mismatch: The API uses internal type identifiers that may not match the UI labels. Verify the alert type in the API response matches your intent.

For API details, see Alerts API v3.

Introduction to alerts Configure an alert definition Alert drill-down No-data handling Alert suppression rules Custom evaluation delay Notification Center Cases

Need help? Contact Support.

What's new? Find out here.

LLM? Read llms.txt.

Previous Coralogix reporter

Next Cases

Troubleshoot alerting

Alert not triggering

Notification delayed or not received

False positive alerts

Anomaly detection not working

Suppression rules not working

Retriggering period confusion

Group-by permutation issues

Webhook and notification destination issues

Alert created via API or Terraform does not behave as expected

Related resources