Skip to content

Guardrails prebuilt policies

Coralogix Guardrails includes prebuilt policies that you can apply immediately to protect your LLM applications. Each policy performs a real-time, deterministic check on model inputs or outputs and throws an exception when a violation is detected.

Available prebuilt policies

PolicyWhat it detectsWhen to use
Prompt InjectionAttempts to manipulate model behavior by injecting malicious instructions into user promptsAll production AI applications that accept user input
PII DetectionPersonally identifiable information such as email addresses, phone numbers, credit card numbers, and Social Security NumbersData privacy compliance (GDPR, HIPAA) and sensitive data leakage prevention
ToxicityToxic, harmful, or offensive content including hate speech, threats, harassment, and abusive languageCustomer-facing AI and brand safety

For domain-specific protection beyond the prebuilt options, see Custom Policies.

Understanding thresholds

All prebuilt policies return a score between 0 and 1.
Score rangeMeaningAction taken?
Closer to 1Violation detected with high confidenceYes — exception thrown
Closer to 0Low severity — probably not a violationNo

Guardrail policies use configurable thresholds in the code. When the score meets or exceeds the threshold you set, the policy throws an exception and blocks the interaction.

Threshold: Defines the value from which a guardrail action is triggered. When the threshold is met or exceeded, the guardrail action is executed, returned through the API, and the system marks the event as an issue.
ThresholdUse case
0.4–0.6Low-risk applications
0.7 (default)General applications
0.8–0.9High-security applications

Prompt Injection

The prompt injection detection policy identifies and blocks attempts to manipulate model behavior through malicious instructions injected into user prompts — such as attempts to ignore system instructions, leak system prompts, or perform unintended actions.

Configuration

Custom threshold

Adjust detection sensitivity (0.0 to 1.0, default 0.7):

# Lower threshold — more sensitive
await guardrails.guard_prompt(
    prompt=user_input,
    guardrails=[PromptInjection(threshold=0.5)],
)

# Higher threshold — less sensitive
await guardrails.guard_prompt(
    prompt=user_input,
    guardrails=[PromptInjection(threshold=0.9)],
)

For full setup including SDK installation and environment variables, see Prompt Injection Detection.

PII Detection

The PII detection policy identifies personally identifiable information in prompts or responses, preventing sensitive personal data from being processed or exposed by your LLM applications.

Available PII categories

CategoryEnum valueDescription
Email AddressPIICategory.EMAIL_ADDRESSEmail addresses
Phone NumberPIICategory.PHONE_NUMBERPhone numbers
Credit CardPIICategory.CREDIT_CARDCredit/debit card numbers
US SSNPIICategory.US_SSNUS Social Security Numbers

Configuration

Specific categories

Detect only specific PII types:

await guardrails.guard_prompt(
    prompt=user_input,
    guardrails=[PII(categories=[PIICategory.EMAIL_ADDRESS, PIICategory.CREDIT_CARD])],
)

Custom threshold

Adjust detection sensitivity (0.0 to 1.0, default 0.7):

await guardrails.guard_prompt(
    prompt=user_input,
    guardrails=[PII(threshold=0.8)],
)

For full setup including SDK installation and environment variables, see PII Detection.

Toxicity

The toxicity detection policy identifies harmful, offensive, or inappropriate content — including hate speech, threats, harassment, and abusive language — in prompts or responses before they reach users.

Configuration

Custom threshold

Adjust detection sensitivity (0.0 to 1.0, default 0.7):

await guardrails.guard_prompt(
    prompt=user_input,
    guardrails=[Toxicity(threshold=0.8)],
)

For full setup including SDK installation and environment variables, see Toxicity Detection.