Skip to content

Concepts and Terms

This document provides an overview of key concepts related to Large Language Models (LLMs), including spans, traces, and evaluations.

Span

A span is a unit of work that includes an input, output, start time, and end time. It tracks specific operations, such as a retriever, tool call, or LLM step.

Attributes

Span attributes are key pieces of metadata that provide detailed information about a specific span (a unit of work) within a trace in an LLM-based application. These attributes help track and analyze the performance and behavior of operations. Below are some common span attributes:

  • Cost – The cost associated with the LLM call span calculated according to input and output tokens and estimated LLM model cost. It helps in assessing the financial impact of specific operations.
  • Prompt – A piece of input text provided to a model to generate a specific response. The prompt types are:
    • User – An input or query provided by the user to initiate a response from the model.
    • System – A specialized prompt used to establish the overall context, behavior, or persona of the AI's responses. It serves as a foundational instruction that guides the model’s actions throughout an interaction, often remaining invisible to the end user.
    • Tool – A tool call output.
    • Assistant – A previous response from the LLM that forms part of the ongoing chat.
  • Response – The result or response generated by the span, which could be the output of an LLM's response, a tool’s result, or the processed data sent back to the user or another system.
  • Token – The smallest unit of meaning in NLP models. Tokens can represent various linguistic elements, such as words, word pieces, characters or punctuation marks.
  • Duration – The time taken to complete the span. It indicates the performance of the operation and helps in identifying bottlenecks or slowdowns.
  • Model – The specific LLM that was used to process the span. This attribute helps in distinguishing between different models and analyzing their performance.
  • Trace ID – A unique identifier assigned to a trace that encompasses one or more spans. The trace ID links all spans that are part of a single request or workflow, allowing for end-to-end tracking and debugging of operations.

Tool call

The AI agent's ability to invoke external tools, APIs, or functions to access information, perform computations, or execute actions beyond its built-in capabilities. For example, using API calls to retrieve real-time data (e.g., weather, stock prices).

Trace

A trace presents the work required to process a request in your LLM application and is made up of one or more spans.

Evaluations

Evaluations (evals) are security and quality checks and factory-set metrics for assessing the quality and effectiveness of your LLM conversations, based on specific criteria, such as toxicity, topic relevance, completeness, etc. Coralogix AI observability offers a comprehensive selection of predefined security (identifying potential security vulnerabilities or risks) and quality (covering all non-security aspects) evaluations. A complete list of all evaluation categories is available in the Eval Catalog.

Issues

Eval issues are based on datasets with factory-set thresholds. The system evaluates the spans passing through the applied evals. Issues are identified, and a score is assigned to each of them:

  • A high score indicates that a threshold has been exceeded, marking it as an issue.
  • A low score (below the threshold) is not flagged as an issue.

Additionally, issue scores are displayed as high or low labels for each LLM call with eval metrics.