Skip to content

Custom guardrail policies

Custom guardrails allow you to define domain-specific protection policies using natural language instructions, based on your unique business rules, compliance requirements, or application-specific constraints.

What you need

Install the SDK

pip install cx-guardrails

Set up environment variables

export CX_GUARDRAILS_TOKEN="your-coralogix-guardrails-api-key"
export CX_GUARDRAILS_ENDPOINT="https://api.<region>.coralogix.com/api/v1/guardrails/guard"
export CX_TOKEN="your-coralogix-send-your-data-key"
export CX_ENDPOINT="https://your-domain.coralogix.com"

# Optional: Application metadata for observability
export CX_APPLICATION_NAME="my-app"
export CX_SUBSYSTEM_NAME="my-subsystem"

Set up observability

from llm_tracekit import setup_export_to_coralogix
from llm_tracekit.openai import OpenAIInstrumentor

setup_export_to_coralogix(
    service_name="my-service",
    application_name="my-app",
    subsystem_name="my-subsystem",
)
OpenAIInstrumentor().instrument()

Understanding custom guardrails

Custom guardrails are configured using the Custom class, which requires:

  • name: A unique identifier for your guardrail.
  • instructions: Natural language description of what to evaluate. Must include at least one of {prompt}, {response}, or {history}.
  • Prohibited: Description of what constitutes a violation.
  • Acceptable: Description of what is considered safe.
  • examples (optional): Training examples to improve detection accuracy.
  • threshold: Detection sensitivity (0.0 to 1.0, default 0.7).
  • should_include_system_prompt (optional): Whether to include the system prompt when guardrails run their check.

Instructions

Natural language description of what to evaluate. Must include {prompt}, {response}, or {history}.

Magic variables

Custom guardrails support special variables in the instructions field:
VariableDescriptionUsage
{prompt}The user's input promptEvaluate prompt content
{response}The LLM's responseEvaluate response content
{history}Full conversation historyEvaluate context across turns

Your instructions must include at least one of these variables.

Configuration options

Reference examples

Reference examples are crucial for improving guardrail accuracy. While optional, providing examples significantly enhances detection quality by giving the LLM evaluator concrete references of what constitutes a violation versus safe behavior in your specific context.

Each example is a conversation between the user and the LLM, scored as either:

  • score=1: Violates the policy (should be blocked).
  • score=0: Safe behavior (should be allowed).

Aim for at least 3–5 examples covering different scenarios, edge cases, and borderline situations. Examples can include multi-turn conversations to capture violations that emerge over multiple interactions.

Including the system prompt

Use should_include_system_prompt to decide whether to include the system prompt when guardrails run their check.

Best practices

Writing effective instructions

  1. Be specific: Clearly describe what you're looking for.
  2. Use examples: Provide training examples for better accuracy.
  3. Include context: Use {history} for context-aware detection.
  4. Test iteratively: Adjust instructions based on results.
# Too vague
instructions="Check if the {response} is bad"

# Specific and clear
instructions="""Evaluate if the {response} contains speculation about future
events without proper uncertainty language (e.g., 'might', 'could', 'possibly')."""

Example: Financial advice detection

Detect when LLM responses provide specific financial advice without proper disclaimers:

from cx_guardrails import Custom, CustomEvaluationExample

financial_advice_guardrail = Custom(
    name="financial_advice_detector",
    instructions="Evaluate whether the {response} contains specific financial advice that could be construed as professional investment recommendations.",
    violates="The response provides specific investment recommendations without appropriate disclaimers.",
    safe="The response provides general educational information or includes clear disclaimers.",
    threshold=0.7,
)

Learn more