Concepts and Terms
This document provides an overview of key concepts related to Large Language Models (LLMs), including spans, traces, and evaluations.
Span
A span is a unit of work that includes an input, output, start time, and end time. It tracks specific operations, such as a retriever, tool call, or LLM step.
Attributes
Span attributes are key pieces of metadata that provide detailed information about a specific span (a unit of work) within a trace in an LLM-based application. These attributes help track and analyze the performance and behavior of operations. Below are some common span attributes:
- Cost – The cost associated with the LLM call span calculated according to input and output tokens and estimated LLM model cost. It helps in assessing the financial impact of specific operations.
- Prompt – A piece of input text provided to a model to generate a specific response. The prompt types are:
- User – An input or query provided by the user to initiate a response from the model.
- System – A specialized prompt used to establish the overall context, behavior, or persona of the AI's responses. It serves as a foundational instruction that guides the model’s actions throughout an interaction, often remaining invisible to the end user.
- Tool – A tool call output.
- Assistant – A previous response from the LLM that forms part of the ongoing chat.
- Response – The result or response generated by the span, which could be the output of an LLM's response, a tool’s result, or the processed data sent back to the user or another system.
- Token – The smallest unit of meaning in NLP models. Tokens can represent various linguistic elements, such as words, word pieces, characters or punctuation marks.
- Duration – The time taken to complete the span. It indicates the performance of the operation and helps in identifying bottlenecks or slowdowns.
- Model – The specific LLM that was used to process the span. This attribute helps in distinguishing between different models and analyzing their performance.
- Trace ID – A unique identifier assigned to a trace that encompasses one or more spans. The trace ID links all spans that are part of a single request or workflow, allowing for end-to-end tracking and debugging of operations.
Tool call
The AI agent's ability to invoke external tools, APIs, or functions to access information, perform computations, or execute actions beyond its built-in capabilities. For example, using API calls to retrieve real-time data (e.g., weather, stock prices).
Trace
A trace presents the work required to process a request in your LLM application and is made up of one or more spans.
Evaluations
Evaluations (evals) are security and quality checks and factory-set metrics for assessing the quality and effectiveness of your LLM conversations, based on specific criteria, such as toxicity, topic relevance, completeness, etc. Coralogix AI observability offers a comprehensive selection of predefined security (identifying potential security vulnerabilities or risks) and quality (covering all non-security aspects) evaluations. A complete list of all evaluation categories is available in the Eval Catalog.
Issues
Eval issues are based on datasets with factory-set thresholds. The system evaluates the spans passing through the applied evals. Issues are identified, and a score is assigned to each of them:
- A high score indicates that a threshold has been exceeded, marking it as an issue.
- A low score (below the threshold) is not flagged as an issue.
Additionally, issue scores are displayed as high or low labels for each LLM call with eval metrics.