Concepts and Terms

This document provides an overview of key concepts related to Large Language Models (LLMs), including spans, traces, and evaluations.

Span

A span is a unit of work that includes an input, output, start time, and end time. It tracks specific operations, such as a retriever, tool call, or LLM step.

Attributes

Span attributes are key pieces of metadata that provide detailed information about a specific span (a unit of work) within a trace in an LLM-based application. These attributes help track and analyze the performance and behavior of operations. Below are some common span attributes:

Cost – The cost associated with the LLM call span calculated according to input and output tokens and estimated LLM model cost. It helps in assessing the financial impact of specific operations.
Prompt – A piece of input text provided to a model to generate a specific response. The prompt types are:
- User – An input or query provided by the user to initiate a response from the model.
- System – A specialized prompt used to establish the overall context, behavior, or persona of the AI's responses. It serves as a foundational instruction that guides the model’s actions throughout an interaction, often remaining invisible to the end user.
- Tool – A tool call output.
- Assistant – A previous response from the LLM that forms part of the ongoing chat.
Response – The result or response generated by the span, which could be the output of an LLM's response, a tool’s result, or the processed data sent back to the user or another system.
Token – The smallest unit of meaning in NLP models. Tokens can represent various linguistic elements, such as words, word pieces, characters or punctuation marks.
Duration – The time taken to complete the span. It indicates the performance of the operation and helps in identifying bottlenecks or slowdowns.
Model – The specific LLM that was used to process the span. This attribute helps in distinguishing between different models and analyzing their performance.
Trace ID – A unique identifier assigned to a trace that encompasses one or more spans. The trace ID links all spans that are part of a single request or workflow, allowing for end-to-end tracking and debugging of operations.

Tool call

The AI agent's ability to invoke external tools, APIs, or functions to access information, perform computations, or execute actions beyond its built-in capabilities. For example, using API calls to retrieve real-time data (e.g., weather, stock prices).

Trace

A trace presents the work required to process a request in your LLM application and is made up of one or more spans.

Evaluations

Evaluations (evals) are security and quality checks and factory-set metrics for assessing the quality and effectiveness of your LLM conversations, based on specific criteria, such as toxicity, topic relevance, completeness, etc. Coralogix AI observability offers a comprehensive selection of predefined security (identifying potential security vulnerabilities or risks) and quality (covering all non-security aspects) evaluations. A complete list of all evaluation categories is available in the Eval Catalog.

Issues

Eval issues are based on datasets with factory-set thresholds. The system evaluates the spans passing through the applied evals. Issues are identified, and a score is assigned to each of them:

A high score indicates that a threshold has been exceeded, marking it as an issue.
A low score (below the threshold) is not flagged as an issue.

Additionally, issue scores are displayed as high or low labels for each LLM call with eval metrics.

AI Center Modules

Next Introduction