Toxicity detection
The toxicity detection guardrail protects your LLM applications from generating or processing toxic, harmful, or offensive content such as hate speech, threats, harassment, and abusive language.
What you need
- Python 3.10 or higher.
cx-guardrailsinstalled. See Getting Started with Guardrails.- Environment variables configured:
CX_GUARDRAILS_TOKEN,CX_GUARDRAILS_ENDPOINT.
Install the SDK
Set up environment variables
export CX_GUARDRAILS_TOKEN="your-coralogix-guardrails-api-key"
export CX_GUARDRAILS_ENDPOINT="https://api.<domain>.coralogix.com/api/v1/guardrails/guard"
export CX_TOKEN="your-coralogix-send-your-data-key"
export CX_ENDPOINT="https://your-domain.coralogix.com"
# Optional: Application metadata for observability
export CX_APPLICATION_NAME="my-app"
export CX_SUBSYSTEM_NAME="my-subsystem"
Set up observability
from llm_tracekit import setup_export_to_coralogix
from llm_tracekit.openai import OpenAIInstrumentor
setup_export_to_coralogix(
service_name="my-service",
application_name="my-app",
subsystem_name="my-subsystem",
capture_content=True,
)
OpenAIInstrumentor().instrument()
Usage
import asyncio
from llm_tracekit import setup_export_to_coralogix
from cx_guardrails import Guardrails, Toxicity, GuardrailsTriggered
setup_export_to_coralogix(
service_name="my-service",
application_name="my-app",
subsystem_name="my-subsystem",
capture_content=True,
)
async def main():
guardrails = Guardrails()
async with guardrails.guarded_session():
try:
await guardrails.guard_prompt(
prompt="Hello, how can I help you today?",
guardrails=[Toxicity()],
)
print("✓ No toxicity detected")
except GuardrailsTriggered as e:
print(f"✗ Toxicity detected: {e}")
asyncio.run(main())
Configuration options
Custom threshold
Adjust detection sensitivity (0.0 to 1.0, default 0.7):
Threshold: Defines the value from which a guardrail action is triggered. When the threshold is met or exceeded, the guardrail action is executed, returned through the API, and the system marks the event as an issue.
Threshold guidelines
| Threshold | Use case |
|---|---|
| 0.4–0.6 | Low-risk applications |
| 0.7 (default) | General applications |
| 0.8–0.9 | High-security applications |
Learn more
Theme
Light