Troubleshoot alerting
Diagnose and resolve common alerting issues. This guide covers the most frequent problems reported by customers, with steps to identify and fix each one.
Alert not triggering
An alert definition exists and is active, but it never fires.
Check these first:
- Verify the alert status is Active in Alerts, then Alert Management. Disabled or snoozed alerts do not evaluate.
- Open the alert and review the Query step. Run the same query in Explore to confirm it returns matching data in the expected time range.
- Check the Conditions step. Verify the threshold, evaluation window, and aggregation match what you expect. A common mistake is setting "more than 10 occurrences in 1 minute" when the actual volume is lower.
- Check Application and Subsystem filters. If the alert filters by a specific application or subsystem, confirm data is flowing with those exact values.
- For anomaly detection alerts, verify the data requirements are met. See Anomaly detection not working below.
If the query matches but the alert still doesn't fire:
- Check whether a suppression rule is active that matches this alert.
- Check whether the alert was recently edited. Some edits (such as changing the query) restart the evaluation cycle.
- Use the Alert drill-down to review the evaluation history and confirm whether the condition was met.
Notification delayed or not received
The alert triggers but the notification arrives late or not at all.
Check these first:
- Open the alert definition and go to the Notification step. Verify that Enable notifications is toggled on.
- Check the Notify on setting. If set to Alerts — Signal-Based, notifications are sent for Triggered and Resolved events only. If set to Cases — Incident Management, notifications follow Case lifecycle events.
- In Routing labels, verify the labels match an existing router in Notification Center. Select the share icon next to the labels to see Matching routers. If no routers match, notifications are not delivered.
- Check the router's routing rules. Each rule has triggers, conditions, and destinations. If the condition evaluates to false, the notification is skipped.
Common causes of delayed notifications:
- Retriggering period: If the alert has a retriggering period (for example, 24 hours), notifications are not sent again until the period elapses, even if the alert condition remains active.
- Notification Center processing: There can be a delay of up to 30 seconds between alert evaluation and notification delivery. This is expected behavior.
- Webhook destination errors: Check the
notification.deliveriesdataset for delivery failures. Go to Explore, select thesystem/notification.deliveriesdataset, and filter by the alert name.
Terraform and API configuration pitfall:
If the alert was created via Terraform or the API, the notifyOn field may be set to triggered_only, which means resolved notifications are never sent. This setting is not always visible in the UI. Check your Terraform definition or API payload to verify.
False positive alerts
The alert triggers but the condition does not appear to be met when you investigate.
Common causes:
- Data drift: Metric values can change between when the alert evaluated and when you check. Coralogix alerts evaluate on the streaming pipeline, so the data at evaluation time may differ from what appears in Explore minutes later. Use custom evaluation delay to account for late-arriving data.
- "Less than" conditions with no data: An alert with a "less than 1 log in the last 2 hours" condition triggers when there is no data at all, because 0 is less than 1. If this is not the desired behavior, add an explicit no-data handling rule. See No-data handling.
- Lookback window mismatch: The alert evaluation window and the time range you use in Explore may not align. Verify you are checking the exact window the alert evaluated.
- Metric aggregation differences: PromQL alerts in Coralogix may produce different values than the same query in Grafana due to step interval, aggregation method, or data freshness differences. When debugging, use the View query action on the alert to see the exact PromQL expression and compare it in both systems.
Anomaly detection not working
The anomaly detection alert is not evaluating, not triggering, or showing unexpected behavior.
Data requirements:
- The model requires 7 days of metric history with at least 90% data coverage in that window.
- If the metric already has 7+ days of history when you create the alert, the alert becomes active within approximately 24 hours after the next daily model build.
- If the metric has less than 7 days of history, the alert remains inactive until enough history accumulates.
Changes that restart the 7-day learning period:
- Creating a new anomaly detection alert
- Changing the metric query, filter, or PromQL expression
- Changing core condition logic that defines the time series being modeled
Changes that do not restart the learning period:
- Changing deviation percentage or sensitivity
- Changing notification settings, labels, or suppression rules
- Changing the alert name or priority
Other common issues:
- 500-permutation limit: The system supports a maximum of 500 permutations per metric for anomaly detection. If your group-by produces more than 500 unique combinations, some are not evaluated.
- Evaluation history not available: The evaluation history UI does not currently support anomaly alert types. Use the Alert drill-down to review alert state changes.
For more details, see Anomaly detection alerts.
Suppression rules not working
A suppression rule is configured but alerts are still sending notifications.
Check these first:
- Verify the suppression rule is enabled in Alerts, then Suppression Rules.
- Check the time window. Suppression rules only apply during the configured schedule. If the current time is outside the window, the rule is not active.
- Check the matching criteria. The rule must match the alert definition by name, labels, or other attributes. A mismatch means the rule does not apply.
- Check whether the alert uses Notification Center routing. Suppression rules apply to the alert evaluation, not to the notification delivery. If the alert is suppressed, no notification is generated.
Important behavior:
- Suppression disables notifications, not alert evaluation. The alert still evaluates, but no notification is sent while suppressed.
- Suppressed alerts do not create Cases. If you see a Case for a suppressed alert, the Case was created before suppression was applied.
For more details, see Alert suppression rules.
Retriggering period confusion
Alerts are not re-notifying or incidents are not appearing in the expected time range.
How retriggering works:
The retriggering period controls how often notifications are sent for an ongoing alert condition. If set to 24 hours, the alert fires once and does not send another notification until 24 hours have passed, even if the condition remains active.
Common issues:
- Incident not visible in the Incidents page: The Incidents page filters based on the retriggering period. If the alert has a 24-hour retriggering period and the last event was 12 hours ago, the incident may not appear in the current time range. This is expected behavior. Cases do not have this limitation and show all active issues regardless of retriggering period.
- Alert not re-notifying: If the retriggering period has not elapsed, no new notification is sent. Shorten the retriggering period if you need more frequent notifications.
- Debounce behavior: The retriggering period acts as a debounce mechanism to reduce notification noise. If the alert condition clears and re-triggers within the period, no new notification is sent.
Group-by permutation issues
Alerts with group-by fields produce unexpected results or miss some groups.
Common causes:
- Too many permutations: Each unique combination of group-by values creates a separate evaluation. If the number of permutations exceeds system limits, some groups are not evaluated.
- Ownership tag propagation delay: If your group-by uses ownership tags (service, team, environment), these tags take 2 to 5 minutes to propagate from parent resources. During this window, alerts may not match the expected groups.
- Missing group-by values: If a resource does not have a value for a group-by field, it is excluded from the evaluation.
Webhook and notification destination issues
Notifications are configured but not arriving at the external destination (Slack, PagerDuty, email, webhook).
Check these first:
- Verify the connector is configured and active in Notification Center, then Connectors.
- Send a test notification from the connector configuration page to verify connectivity.
- Check the
notification.deliveriesdataset for delivery errors. Go to Explore, set the dataset tosystem/notification.deliveries, and filter by connector type or alert name. - For webhook destinations, verify the endpoint URL is correct, authentication credentials are valid, and the endpoint is accepting requests.
- For Slack, verify the workspace integration is connected and the channel name is correct.
- For PagerDuty, verify the service key (Integration Key) is correct.
Common causes:
- Expired credentials: API tokens or integration keys may have expired or been rotated.
- Rate limiting: The external destination may be rate-limiting requests from Coralogix.
- Payload format mismatch: The preset message format may not match what the destination expects. Check the preset configuration and the destination's API documentation.
Alert created via API or Terraform does not behave as expected
Alerts created programmatically may have settings that differ from UI-created alerts.
Common issues:
notifyOnset totriggered_only: Resolved notifications are never sent. Update the Terraform definition or API payload to include resolved events.- Missing routing labels: Alerts created via API may not have routing labels set, so no router matches and no notification is delivered. Add routing labels to the alert definition.
- Data sources not set: If
data_sourcesis omitted in the API payload, the alert evaluates againstdefault/logs. If your data is in a different dataset, set thedata_sourcesfield explicitly. - Alert type mismatch: The API uses internal type identifiers that may not match the UI labels. Verify the alert type in the API response matches your intent.
For API details, see Alerts API v3.