Skip to content

Evaluations

Evaluate the quality of your LLMs against policies such as toxicity or sensitive data using evaluators. Apply them to measure quality and security issues during development and in production.

Through the Policy Catalog, you can select and configure policies to monitor specific behaviors and issues for each application.

How evals and policies work together

Policies are configurable evaluation and guardrail mechanisms that help you detect and control security, safety, and quality risks in LLM-based applications.

In the Policy Catalog, choose the policies that match the behaviors you want to monitor for each application, then configure the evaluators that will score those policies. Once configured, evals are applied automatically to all spans streamed into the platform, without adding latency to the ingestion process or interfering with live requests.

Policy Catalog screen in AI Center showing prebuilt policies — Allowed Topics, Competition Discussion, Completeness, Context Adherence, Context Relevance, Correctness, PII, Prompt Injection, Restricted Topics, and Sexism — each tagged by category (Topics, Hallucinations, Security, Toxicity), plus a Custom policies section with a Request for financial advice card and a Create a custom policy tile

The Policy Catalog organizes prebuilt policies by category (Topics, Hallucinations, Security, Toxicity) and lets you add your own under Custom policies — pick the policies that match the behaviors you want to monitor for each application.

Coralogix displays high scores as issues in the relevant Application dashboards. High and low scores also appear as labels for each AI span in AI Explorer.

Prebuilt policies

Apply ready-to-use evaluations for security, hallucination detection, toxicity, topics, user experience, and compliance. See Prebuilt Policies for the full list and setup instructions.

Custom policies

Alongside predefined policies, you can create your own policies based on user-defined criteria and use cases. Custom evaluation criteria and prompt templates let you measure what actually matters for your application — regulatory compliance, tone consistency, task completion accuracy, and more. See Custom Policies.

Billing

Customers are billed based on the number of active evaluations and the volume of LLM interactions (tokens). See Pricing for details.

Next steps

Apply ready-to-use evaluations for security, hallucination detection, and more with Prebuilt evaluation policies.