LiteLLM
Coralogix's AI Observability integrations are designed to provide deep insight into applications leveraging large language models. Through a dedicated integration with the LiteLLM SDK, Coralogix delivers a unified view of calls across various LLM providers, enabling teams to track performance, costs, and errors in a single place. This helps teams standardize monitoring, compare model performance, and optimize the entire system for efficiency and accuracy.
Overview
This library offers customized instrumentation for the LiteLLM SDK, optimized to support the development of production-ready applications. It provides streamlined integration and offers detailed visibility into LLM calls through LiteLLM's unified interface. This enables effective debugging, performance analysis, and a clear understanding of all LLM interactions, regardless of the underlying provider (OpenAI, Azure, Anthropic, Cohere, etc.).
Note
Instrumentation of async completion (acompletion
) calls is not possible due to technical issues within the LiteLLM SDK.
Requirements
- Python version 3.9 and above.
- Coralogix API keys.
Note
Installation using uv
on the Windows platform is not supported due to technical reasons.
Installation
Run the following command:
Authentication
Authentication data is passed during the instrumentor object's creation. You can do this in one of two ways:
Passing arguments to the constructor
You can pass the token, endpoint, and other parameters directly when initializing the LiteLLMInstrumentor
.
from llm_tracekit import LiteLLMInstrumentor
instrumentor = LiteLLMInstrumentor(
coralogix_token="<your_coralogix_token>",
coralogix_endpoint="<your_coralogix_endpoint>",
application_name="<ai-application>",
subsystem_name="<ai-subsystem>"
)
Using environment variables
If arguments are not passed to the constructor, the instrumentor will automatically use the following environment variables:
CX_TOKEN
: Your Coralogix API keyCX_ENDPOINT
: The endpoint associated with your Coralogix domainCX_APPLICATION_NAME
: Your application's nameCX_SUBSYSTEM_NAME
: Your subsystem's name
Usage
This section describes how to set up instrumentation for the LiteLLM SDK.
Set up tracing
Instrument
To instrument all clients, create an instance of LiteLLMInstrumentor
and call the instrument
method.
from llm_tracekit import LiteLLMInstrumentor
# Arguments can be passed here or set as environment variables
instrumentor = LiteLLMInstrumentor()
instrumentor.instrument()
Uninstrument
To uninstrument clients, call the uninstrument
method.
Full example
import litellm
from llm_tracekit import LiteLLMInstrumentor
# Activate instrumentation
# Coralogix connection details are read from environment variables:
# - CX_TOKEN
# - CX_ENDPOINT
# - CX_APPLICATION_NAME
# - CX_SUBSYSTEM_NAME
instrumentor = LiteLLMInstrumentor()
instrumentor.instrument()
# LiteLLM SDK Usage Example
response = litellm.completion(
model="gpt-4o-mini",
messages=[{"content": "What is the capital of Italy?", "role": "user"}]
)
print(response)
Enable message content capture
By default, message content, such as the contents of the prompt, completion, function arguments and return values, are not captured.
To capture message content as span attributes, set the environment variable OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT
to true
.
Most Coralogix AI evaluations require message contents to function properly, so enabling message capture is strongly recommended.
Key differences from OpenTelemetry
User prompts and model responses are captured as span attributes instead of log events, as detailed below.
Semantic conventions
Attribute | Type | Description | Example |
---|---|---|---|
gen_ai.operation.name | string | The specific name of the operation being performed | chat |
gen_ai.system | string | The provider or framework responsible for the operation | openai , anthropic , cohere |
gen_ai.request.model | string | The name of the model that the user or application requested | gpt-4o-mini |
gen_ai.request.temperature | float | The 'temperature' parameter passed in the request. It controls the randomness of the output: higher values (e.g., 1.0) make the output more random, while lower values make it more deterministic. | 1.0 |
gen_ai.request.top_p | float | The 'top_p' parameter used for nucleus sampling. The model considers only the tokens comprising the top 'p' probability mass, serving as an alternative to temperature for controlling randomness. | 1.0 |
gen_ai.prompt.<message_number>.role | string | Role of message author for user message | system , user , assistant , tool |
gen_ai.prompt.<message_number>.content | string | Contents of user message | What's the weather in Paris? |
gen_ai.prompt.<message_number>.tool_calls.<tool_call_number>.id | string | ID of tool call in user message | call_yPIxaozNPCSp1tJ34Hsbdtzg |
gen_ai.prompt.<message_number>.tool_calls.<tool_call_number>.type | string | Type of tool call in user message | function |
gen_ai.prompt.<message_number>.tool_calls.<tool_call_number>.function.name | string | The name of the function used in tool call within user message | get_current_weather |
gen_ai.prompt.<message_number>.tool_calls.<tool_call_number>.function.arguments | string | Arguments passed to the function used in tool call within user message | {"location": "Paris"} |
gen_ai.prompt.<message_number>.tool_call_id | string | Tool call ID in user message | call_mszuSIzqtI65i1wAUOE8w5H4 |
gen_ai.completion.<choice_number>.role | string | Role of message author for choice in model response | assistant |
gen_ai.completion.<choice_number>.finish_reason | string | Finish reason for choice in model response | stop , tool_calls , error |
gen_ai.completion.<choice_number>.content | string | Contents of choice in model response | The weather in Paris is rainy and overcast, with temperatures around 57°F |
gen_ai.completion.<choice_number>.tool_calls.<tool_call_number >.id | string | ID of tool call in choice | call_O8NOz8VlxosSASEsOY7LDUcP |
gen_ai.completion.<choice_number>.tool_calls.<tool_call_number >.type | string | Type of tool call in choice | function |
gen_ai.completion.<choice_number>.tool_calls.<tool_call_number >.function.name | string | The name of the function used in tool call within choice | get_current_weather |
gen_ai.completion.<choice_number>.tool_calls.<tool_call_number >.function.arguments | string | Arguments passed to the function used in tool call within choice | {"location": "Paris"} |
gen_ai.response.model | string | The exact name of the model that produced the response. | gpt-4o-mini-2024-07-18 |
gen_ai.response.id | string | A unique identifier assigned to the specific completion | chatcmpl-CEaLMZn6bfTEOKFumw5IdMFiZ657a |
gen_ai.usage.input_tokens | int | The number of tokens consumed by the prompt sent to model | 66 |
gen_ai.usage.output_tokens | int | The number of tokens generated in the model response | 44 |