Skip to content

AI Center alerts and metrics

Deploy a prebuilt observability pack for AI applications. The AI Center Alerts extension installs two things together:

  • Five Events2Metrics rules that convert AI spans into Prometheus-style metrics for latency, errors, cost, security issues, and quality issues.
  • Five metric threshold alerts built on those metrics that fire on high error rates, slow responses, cost spikes, security violations, and quality issues.

Use the alerts out of the box, customize them, or query the metrics directly to build your own alerts and dashboards.

What you need

  • AI spans flowing into Coralogix from one of the supported AI integrations. Span attributes follow the OpenTelemetry gen_ai.* semantic conventions.
  • An existing application and subsystem in your account to scope the alerts. Each deployed alert is grouped by application_name and subsystem.
  • Custom guardrails or evaluations configured under the Security or Quality category, if you want those to count toward the corresponding issue alerts.

Set up

  1. In Coralogix, navigate to Integrations, then Extensions.
  2. Search for AI Center Alerts, or filter by the Observability extension type.
  3. Select the AI Center Alerts card to open the extension detail page.

Screenshot of AI Center Alerts extension card on the Extensions page showing the Observability tag and item count

Shows the AI Center Alerts card on the Extensions catalog page.

  1. From the Version dropdown, select the version you want to deploy.
  2. From the Application dropdown, select All.
  3. From the Subsystem dropdown, select All.
  4. Under Alerts, the five alerts are selected by default. Deselect any alert you do not want to deploy. The five underlying Events2Metrics rules are required for the alerts to function and are installed with the extension.
  5. Select Deploy.

Screenshot of the AI Center Alerts extension detail page showing version, applications, subsystems, and the five alert items

Shows the AI Center Alerts extension detail page during deployment.

The extension creates the five alerts and the five Events2Metrics rules in your account.

  • To view the alerts, navigate to Alerting, then Alert definition management. Alert names start with AI Center |.
  • To view the Events2Metrics rules, navigate to Data Flow, then Events2Metrics. Names start with ai_center_.

Use it

Customize an alert

After deployment, the alerts behave like any other metric threshold alert. You can edit thresholds, schedules, notifications, group-by keys, and priority.

  1. In Coralogix, navigate to Alerting, then Alert definition management.
  2. Find the alert you deployed. Alert names start with AI Center |.
  3. Select the alert to open it for editing.
  4. Adjust the threshold, time window, evaluation window, group-by keys, or notification settings.
  5. Save your changes.

Build custom alerts and dashboards on the AI Center metrics

The five prebuilt alerts group by application_name and subsystem. The underlying Events2Metrics rules also expose a user label and the full set of aggregations (cx_count, cx_max, cx_avg, cx_sum), so you can query them directly for custom alerts and Custom Dashboards.

Note

When writing PromQL against the AI Center metrics, use range functions like sum_over_time and avg_over_time. Do not use rate() or increase() — they are incompatible with Events2Metrics.

To build a per-user alert:

  1. Navigate to Alerting, then Create alert.
  2. Select Metric, then Threshold.
  3. Enter a PromQL query against one of the AI Center metrics, grouped by user. For example, count security issues per user over the last minute:

    sum by (user, application_name, subsystem)(sum_over_time(ai_center_security_issue_cx_count{}[1m]))
    
  4. Configure the threshold and notifications, then save.

Choose the sum_over_time window and the alert time window together

AI Center metrics are aggregated on a one-minute basis. When the alert evaluates, it looks at the most recent points within its time window — and each point already represents the sum_over_time window you set in the PromQL. Setting both to 5 minutes, for example, means the alert reads five overlapping 5-minute sums, so the practical lookback into source data is wider than 5 minutes. Pick the two windows together rather than independently.

To build a dashboard, create a metric widget in Custom Dashboards with a PromQL query against any of the ai_center_latency_cx_*, ai_center_error_cx_*, ai_center_prompt_price_cx_*, ai_center_response_price_cx_*, ai_center_security_issue_cx_*, or ai_center_quality_issue_cx_* metrics.

Update or remove the extension

When a new version of the extension is published, the Extensions page shows an UPDATE AVAILABLE indicator on the card.

To install the new version:

  1. Navigate to Integrations, then Extensions.
  2. Select the AI Center Alerts card.
  3. Select Update.

To uninstall the extension and its alerts and Events2Metrics rules, select Remove from the same page.

How it works

The extension installs two layers:

  1. Five Events2Metrics rules convert AI spans into Prometheus-style metrics. Each Events2Metrics rule filters spans with a Lucene query, extracts a numeric field, and emits four aggregations on a one-minute basis: count (cx_count), max (cx_max), average (cx_avg), and sum (cx_sum). Every metric is labeled with application_name, subsystem, and user. The error, security-issue, and quality-issue rules count matching spans by aggregating their duration field — duration is the numeric source, but cx_count is the value the alerts read.
  2. Five metric threshold alerts evaluate the resulting metrics with PromQL and fire when thresholds are breached.

Because the alerts run on metrics rather than spans directly, they use the standard Coralogix alerting engine and integrate with your existing notification channels, schedules, and incident workflows.

Reference

Alerts

All five alerts are P1 metric threshold alerts. Each one is grouped by application_name and subsystem and notifies on triggered and resolved.
AlertWhat it detectsDefault thresholdTime window
AI Center \| High error rateToo many AI spans ending with errorsMore than 5 errors per 5 minutes5 minutes
AI Center \| Cost spikeTotal inference spend higher than expectedCombined tags.gen_ai.prompt_price and tags.gen_ai.response_price greater than 10 in the last hour60 minutes
AI Center \| High latencyAI span response time stays above targetMax latency greater than 60,000,000 microseconds (1 minute)5 minutes
AI Center \| Security issues spikeSecurity issues firing too oftenMore than 10 issues per 10 minutes10 minutes
AI Center \| Quality issues spikeQuality issues firing too oftenMore than 10 issues per 10 minutes10 minutes

Events2Metrics

Events2Metrics ruleSource fieldOutput metricsSpans matched
ai_center_latency_cxdurationai_center_latency_cx_count, ai_center_latency_cx_max, ai_center_latency_cx_avg, ai_center_latency_cx_sumAI spans matching the OpenTelemetry GenAI semantic conventions (tags.gen_ai.system or tags.gen_ai.provider.name)
ai_center_errors_cxduration (counts matching spans)ai_center_error_cx_count, ai_center_error_cx_max, ai_center_error_cx_avg, ai_center_error_cx_sumError spans (tags.otel.status_code:ERROR) from AI calls (tags.gen_ai.system or tags.gen_ai.provider.name)
ai_center_costtags.gen_ai.prompt_price, tags.gen_ai.response_priceai_center_prompt_price_cx_*, ai_center_response_price_cx_* (count, max, avg, sum)Spans with prompt or response price
ai_center_security_issues_cxduration (counts matching spans)ai_center_security_issue_cx_count, ai_center_security_issue_cx_max, ai_center_security_issue_cx_avg, ai_center_security_issue_cx_sumSpans with security guardrails or evaluations triggered, built-in or custom
ai_center_quality_issues_cxduration (counts matching spans)ai_center_quality_issue_cx_count, ai_center_quality_issue_cx_max, ai_center_quality_issue_cx_avg, ai_center_quality_issue_cx_sumSpans with quality guardrails or evaluations triggered, built-in or custom

Every metric is labeled with application_name (from the span's applicationname), subsystem (from subsystemname), and user (from tags.gen_ai.request.user).

In addition to the metrics in the table, each Events2Metrics rule auto-creates a counter metric named <rule>_cx_docs_total that counts every span matching the rule's Lucene query. The pack therefore also produces ai_center_latency_cx_cx_docs_total, ai_center_errors_cx_cx_docs_total, ai_center_cost_cx_docs_total, ai_center_security_issues_cx_cx_docs_total, and ai_center_quality_issues_cx_cx_docs_total. None of the prebuilt alerts use these — they are available for custom alerts and dashboards (for example, total AI call volume or issue ratios).

Issue detection sources

The ai_center_security_issues_cx and ai_center_quality_issues_cx Events2Metrics rules count the following sources. All sources apply to both the prompt and response sides of a span.

Security issues:

  • Built-in guardrails: pii, prompt_injection
  • Built-in evaluations with label:p1: pii, prompt_injection, sql_allowed_tables, sql_read_only, sql_restricted_tables
  • Custom guardrails 0 to 9 with category:security and triggered:"true"
  • Custom evaluations 0 to 9 with category:security and triggered:"true"

Quality issues:

  • Built-in guardrails: toxicity
  • Built-in evaluations with label:p1: allowed_topics, competition, hallucination_completeness, hallucination_context_adherence, hallucination_context_relevance, hallucination_correctness, hallucination_task_adherence, language_mismatch, restricted_topics, sexism, sql_hallucination, toxicity
  • Custom guardrails 0 to 9 with category:quality and triggered:"true"
  • Custom evaluations 0 to 9 with category:quality and triggered:"true"

Limitations

  • Custom guardrails and evaluations are aggregated only for indexes 0 to 9 in each category. If you have more than 10 custom guardrails or 10 custom evaluations under the same category, the additional ones are not counted by the default issue alerts. To monitor them, build a custom alert with an extended Lucene query.
  • The high latency alert threshold is in microseconds. The default value, 60000000, equals 1 minute. To set a 30-second threshold, use 30000000. To set a 5-minute threshold, use 300000000.
  • The cost spike alert reads price values from the tags.gen_ai.prompt_price and tags.gen_ai.response_price span attributes.
  • The five alerts default to grouping by application_name and subsystem. To alert per user, build a custom alert against the Events2Metrics rules with user in the group-by clause.
  • Each Events2Metrics rule in the extension is configured with a 30,000-permutation-per-day limit. High-cardinality combinations of application_name, subsystem, and user can reach this limit. Permutations beyond it are dropped, so alerts may miss issues from those slices. Track usage in Data Flow, then Events2Metrics.
  • The five Events2Metrics rules count against your organization's daily quota of 10M metric permutations. Metric rules are blocked when either the per-rule limit or the org-wide quota is reached.
  • Metrics gather from the moment the extension is deployed. Alerts do not fire retroactively on AI spans ingested before deployment.
  • For custom alerts and dashboards against the AI Center metrics, use range functions like sum_over_time and avg_over_time. Do not use rate() or increase() — they are incompatible with Events2Metrics. The five prebuilt alerts already follow this.

Troubleshoot

No alerts are firing. Cause: AI spans are not being captured, or no application or subsystem matches the deployment scope. Fix: confirm AI spans with tags.gen_ai.system are flowing in by querying _exists_:tags.gen_ai.system in Logs Explorer, then check that the alert's deployed applications and subsystems match your data.

A custom guardrail or evaluation is not counted. Cause: the custom item is at index 10 or higher, or the triggered field is not "true". Fix: move the item to indexes 0 to 9 and confirm the triggered field is set on a triggered span.

The latency alert never fires even with slow spans. Cause: the threshold value is in microseconds. A value of 60000 is 60 milliseconds, not 60 seconds. Fix: set the threshold in microseconds. Common values: 30000000 for 30 seconds, 60000000 for 1 minute, 300000000 for 5 minutes.

Learn more