Guardrails prebuilt policies
Coralogix Guardrails includes prebuilt policies that you can apply immediately to protect your LLM applications. Each policy performs a real-time, deterministic check on model inputs or outputs and throws an exception when a violation is detected.
Available prebuilt policies
| Policy | What it detects | When to use |
|---|---|---|
| Prompt Injection | Attempts to manipulate model behavior by injecting malicious instructions into user prompts | All production AI applications that accept user input |
| PII Detection | Personally identifiable information such as email addresses, phone numbers, credit card numbers, and Social Security Numbers | Data privacy compliance (GDPR, HIPAA) and sensitive data leakage prevention |
| Toxicity | Toxic, harmful, or offensive content including hate speech, threats, harassment, and abusive language | Customer-facing AI and brand safety |
For domain-specific protection beyond the prebuilt options, see Custom Policies.
Understanding thresholds
All prebuilt policies return a score between 0 and 1.
| Score range | Meaning | Action taken? |
|---|---|---|
| Closer to 1 | Violation detected with high confidence | Yes — exception thrown |
| Closer to 0 | Low severity — probably not a violation | No |
Guardrail policies use configurable thresholds in the code. When the score meets or exceeds the threshold you set, the policy throws an exception and blocks the interaction.
Threshold: Defines the value from which a guardrail action is triggered. When the threshold is met or exceeded, the guardrail action is executed, returned through the API, and the system marks the event as an issue.
| Threshold | Use case |
|---|---|
| 0.4–0.6 | Low-risk applications |
| 0.7 (default) | General applications |
| 0.8–0.9 | High-security applications |
Prompt Injection
The prompt injection detection policy identifies and blocks attempts to manipulate model behavior through malicious instructions injected into user prompts — such as attempts to ignore system instructions, leak system prompts, or perform unintended actions.
Configuration
Custom threshold
Adjust detection sensitivity (0.0 to 1.0, default 0.7):
# Lower threshold — more sensitive
await guardrails.guard_prompt(
prompt=user_input,
guardrails=[PromptInjection(threshold=0.5)],
)
# Higher threshold — less sensitive
await guardrails.guard_prompt(
prompt=user_input,
guardrails=[PromptInjection(threshold=0.9)],
)
For full setup including SDK installation and environment variables, see Prompt Injection Detection.
PII Detection
The PII detection policy identifies personally identifiable information in prompts or responses, preventing sensitive personal data from being processed or exposed by your LLM applications.
Available PII categories
| Category | Enum value | Description |
|---|---|---|
| Email Address | PIICategory.EMAIL_ADDRESS | Email addresses |
| Phone Number | PIICategory.PHONE_NUMBER | Phone numbers |
| Credit Card | PIICategory.CREDIT_CARD | Credit/debit card numbers |
| US SSN | PIICategory.US_SSN | US Social Security Numbers |
Configuration
Specific categories
Detect only specific PII types:
await guardrails.guard_prompt(
prompt=user_input,
guardrails=[PII(categories=[PIICategory.EMAIL_ADDRESS, PIICategory.CREDIT_CARD])],
)
Custom threshold
Adjust detection sensitivity (0.0 to 1.0, default 0.7):
For full setup including SDK installation and environment variables, see PII Detection.
Toxicity
The toxicity detection policy identifies harmful, offensive, or inappropriate content — including hate speech, threats, harassment, and abusive language — in prompts or responses before they reach users.
Configuration
Custom threshold
Adjust detection sensitivity (0.0 to 1.0, default 0.7):
For full setup including SDK installation and environment variables, see Toxicity Detection.