Skip to content

Fleet Management architecture

Fleet Manager lets you centrally manage the configuration of all OpenTelemetry Collectors in your environment. The system works across Collector types—such as cluster-collectors and agents—and across Kubernetes and VM-based deployment methods (DaemonSet, StatefulSet, Deployment, ECS EC2, and more). This architecture ensures consistent, remote configuration at scale without requiring direct access to individual hosts.

Key components

The Fleet Manager architecture consists of three components that work together to monitor agent state and deliver, apply, and verify configuration changes: the Fleet Manager, the Supervisor, and the OpenTelemetry Collector.

flowchart LR
    subgraph S[Supervisor]
        direction TB
        C[OpAMP Client]
    end

    S -->|OpAMP| FM[Fleet Manager]
    S --> OC[OpenTelemetry Collector]

Fleet Manager (control plane)

The Fleet Manager is the backend control plane. It implements the Open Agent Management Protocol (OpAMP) and manages the full lifecycle of remote configuration.

It provides the ability to:

  • Store remote configuration definitions and metadata.
  • Identify which agents should receive each configuration.
  • Deliver configuration updates.
  • Track configuration status to maintain rollout history.
  • Keep agents aligned with the desired configuration state.

Supervisor

The Supervisor is a standalone binary that connects the Collector to the Fleet Manager using OpAMP. It provides controlled, reliable configuration updates.

The Supervisor enables you to:

  • Apply new configurations safely by writing the updated Collector configuration, stopping the current Collector, restarting it with the new file, and reporting the application status.
  • Recover automatically by rolling back to the last working configuration if an update fails (for example, due to invalid YAML).

Each Collector instance runs its own Supervisor process.

OpenTelemetry Collector

The OpenTelemetry Collector receives, processes, and exports telemetry data. The Collector cannot safely reload a configuration on its own, which is why the Supervisor is required. The Supervisor handles restart logic, merging, and rollback, while the Collector focuses on data processing.

End-to-end architecture flow

Supervisor heartbeat to Fleet Manager

The remote configuration process starts when the Supervisor initiates a heartbeat request to the Fleet Manager using an AgentToServer message. This periodic heartbeat is how the Supervisor polls the server to check for pending configuration updates.

Remote configuration delivery pipeline

  1. Fleet Manager checks compatibility.

    Fleet Manager verifies that the agent supports remote configuration.

  2. Fleet Manager selects the correct configuration.

  3. The system finds the most recently created configuration matching:
    • Company ID
    • Cluster name
    • Agent type
    • Any other agent attributes as specified in the configuration’s Agent selector
  4. Fleet Manager sends configuration to the Supervisor.

    If a configuration is pending, Fleet Manager includes an AgentRemoteConfig patch in the ServerToAgent response.

  5. Supervisor applies the configuration.

    It merges configuration inputs, writes the new config file, and executes restart logic.

  6. Supervisor restarts the Collector.

    • Sends an INTERRUPT (graceful stop; waits up to 10 seconds).
    • If needed, sends a KILL.
    • Starts the Collector with the new configuration.
sequenceDiagram
    participant C as Collector
    participant S as Supervisor
    participant FM as Fleet Manager

    S->>FM: Heartbeat
    FM-->>S: Remote config

    S-->>S: Merge and write config
    S->>C: INTERRUPT
    S-->>S: Wait 10s

    alt Not stopped
        S->>C: KILL
    end

    S->>C: Start Collector

    alt Started
        S->>FM: Status applied
    else Failed
        S->>FM: Status failed
    end

State reporting

After receiving a remote configuration, the Supervisor reports the configuration status back to the Fleet Manager.

  • Applying: Reported immediately after receiving the configuration
  • Succeeded (Applied): Collector started successfully; includes SHA256 config hash
  • Failed: Collector failed to start; Supervisor rolls back to the previous working configuration.

If Fleet Manager detects that it missed a status update, it requests the Supervisor to resend its full state.

Deployment architecture

Supported platforms

Fleet Manager currently supports:

  • Kubernetes (Agents view and remote configuration support)
  • AWS ECS EC2 (Agents view only)
  • Linux and Windows hosts (coming soon)

Network architecture

The Supervisor communicates with the Fleet Manager using the OpAMP protocol. In this model, the Supervisor acts as the OpAMP client and initiates all communication, while the Fleet Manager acts as the OpAMP server. Communication uses either HTTP or WebSocket transport, and all messages are exchanged as binary Protobuf payloads.

The Supervisor sends AgentToServer messages with status and health information. Fleet Manager responds with ServerToAgent messages to deliver remote configuration updates.

OpAMP defines only the communication between Supervisor and Fleet Manager—it does not define how the Supervisor controls the Collector itself.

Multi-environment behavior

Fleet Manager targets Collectors based on cluster name and agent type, and can include additional selectors.

This targeting enables:

  • Consistent configuration across clusters
  • Per-environment or per-role configuration segmentation
  • Automatic handling of new agents

If a new agent matches an existing configuration’s selectors, Fleet Manager automatically sends the most recently created remote configuration that matches with that agent’s attributes.

Best practices

Follow these best practices to ensure safe, predictable operation:

  • Use the Supervisor preset for any remote-managed deployment.

    Without it, agents operate in local-only or read-only mode.

  • Keep minimal Collector configuration enabled when using the Supervisor.

    This ensures a consistent, remote-only configuration flow and prevents local configuration from interfering with Supervisor management.

  • Keep Collector versions aligned across the fleet.

    Configuration compatibility depends on Collector version. Mixing versions can introduce unexpected behavior during rollouts. In simple cases, a configuration update may have no effect and agents continue running with their existing setup. In more severe cases, the Collector may fail to load the new configuration entirely, causing the remote configuration to be ignored and forcing the Collector to restart using the last known working configuration.

How Fleet Manager fits into your environment

Fleet Manager is designed for flexibility and works well alongside existing deployment and security practices. Keep the following considerations in mind when planning your setup:

  • GitOps and Operator workflows

    Fleet Manager operates independently of GitOps tools (such as ArgoCD or Flux) and the OpenTelemetry Operator. It does not modify any Collector ConfigMaps in your cluster, which prevents interference with GitOps workflows and avoids conflicts with declarative configuration sources.

  • Secret management

    Fleet Manager expects secrets to be managed through your preferred secure mechanism—such as Kubernetes Secrets or environment variables—so your existing secret-management workflows remain unchanged.

  • Configuration validation

    Fleet Manager checks YAML validity and leaves semantic validation to the Collector. This ensures maximum compatibility across Collector versions and allows advanced users to apply custom or experimental pipelines without restrictions.

Support

Need help?

Our world-class customer success team is available 24/7 to walk you through your setup and answer any questions that may come up.

Feel free to reach out to us via our in-app chat or by sending us an email at support@coralogix.com.

Was this helpful?