Skip to content

Set up configuration fallback for Fleet Management

Keep your OpenTelemetry Collectors running with a known-good configuration, even when the Fleet Manager is temporarily unreachable. Configuration fallback syncs every active remote configuration to an Amazon S3 bucket that you own, giving agents a durable, centralized backup they can retrieve independently of the Fleet Manager control plane.

Setting up S3 storage is optional, but recommended for safe recovery. Without fallback, an agent that loses contact with Fleet Manager can only rely on the last configuration it applied locally. With S3 fallback enabled, agents gain access to the most recent active configuration stored in your bucket, reducing the blast radius of control-plane outages and providing consistent recovery across clusters.

What you need

  • An AWS S3 bucket dedicated to configuration fallback storage
  • An IAM role that grants Fleet Manager s3:PutObject and s3:ListBucket on the bucket, and grants the Supervisor s3:GetObject on the bucket
  • OpenTelemetry Collectors deployed with the Supervisor preset enabled (see Enable Fleet Management for Kubernetes)
  • Fleet Management enabled for your Coralogix organization

How it works

Fleet Management uses a three-layer resilience model to ensure agents always have a valid configuration:

  1. Remote configuration (primary) -- The Supervisor polls Fleet Manager for the latest active configuration over OpAMP. This is the standard delivery path described in the Fleet Management architecture.
  2. S3 fallback (secondary) -- When Fleet Manager is unreachable, the Supervisor retrieves the most recent configuration from your S3 bucket. This is especially important for brand-new agents that have never received a remote configuration — without S3 fallback, they have no local last-known-good to fall back to.
  3. Local last-known-good (tertiary) -- If both Fleet Manager and S3 are unavailable, the Supervisor falls back to the last configuration it successfully applied and stored locally. This layer is only available for agents that have previously received at least one remote configuration.

How Fleet Manager syncs to S3

Fleet Manager automatically writes the active configuration to your S3 bucket each time a configuration version is activated. This ensures the fallback copy stays current with the latest configuration state.

Configurations are stored in the bucket using the following path structure:

<COMPANY_ID>/<GROUP_NAME>/<FAMILY_VERSION>/<REMOTE_CONFIG_NAME>/

Each remote configuration folder contains two files:

  • config.yaml -- the Collector configuration.
  • meta.yaml -- metadata about the configuration, including name, agent selectors, identifiers, and timestamps.

When you change your S3 bucket, Fleet Manager copies all active configurations from the previous bucket to the new one. Configurations continue to apply normally during migration. Data in the old bucket is not deleted.

flowchart TD
    FM[Fleet Manager]
    S3[(S3 bucket)]
    LC[Local config]
    SUP[Supervisor]

    FM -->|Syncs active config on each activation| S3

    SUP -->|"Primary: poll for config"| FM
    SUP -.->|"If Fleet Manager unreachable: fetch from S3"| S3
    SUP -.->|"If both unreachable: use local last-known-good"| LC

Set up configuration fallback

Step 1: Create an S3 bucket

Create a dedicated S3 bucket in your AWS account for storing fallback configurations. Coralogix recommends using a bucket name that clearly identifies its purpose, such as coralogix-fleet-fallback-<environment>.

Enable S3 versioning and server-side encryption (SSE-S3 or SSE-KMS) on the bucket. Use the Standard storage class.

Step 2: Configure IAM permissions

Create an IAM role that allows Fleet Manager to read and write objects in the bucket, and allows the Supervisor to read objects. Fleet Management uses the same IAM identity as S3 archive — you can use CloudFormation, Terraform, or manual setup.

Fleet Manager policy -- allows Fleet Manager to write configurations and list bucket contents:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::YOUR_BUCKET_NAME",
        "arn:aws:s3:::YOUR_BUCKET_NAME/*"
      ]
    }
  ]
}

Supervisor policy -- allows the Supervisor to read fallback configurations:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*"
    }
  ]
}

Trust relationship -- if Fleet Manager accesses the bucket cross-account, add an External ID condition:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::FLEET_MANAGER_ACCOUNT_ID:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "YOUR_EXTERNAL_ID"
        }
      }
    }
  ]
}

The setup supports an optional External ID for cross-account IAM role assumption, adding an extra layer of security when Fleet Manager accesses your bucket.

Step 3: Connect the S3 bucket in Fleet Management

  1. In Coralogix, navigate to Integrations, then Fleet Management. Open the Configurations tab and select Settings.
  2. In Region, select the AWS region where your bucket is located.
  3. In Bucket name, enter the name of your S3 bucket.
  4. (Optional) In External ID, enter the external ID configured in your IAM trust policy.
  5. Select Validate to verify that Fleet Manager can connect to the bucket. If the connection fails, Fleet Management displays the specific error returned by S3 so you can diagnose the issue. Settings are not saved until validation succeeds.
  6. Select Save. Fleet Manager automatically tests the connection to the bucket when saving — no files are uploaded during this check.

    Fleet Management settings panel with S3 Configuration Storage fields

Note

You can apply remote configurations without an S3 bucket — agents fall back to local last-known-good only. Connecting a bucket adds the S3 fallback layer for safer recovery across clusters.

Step 4: Configure the Supervisor for S3 fallback

The Supervisor must be pre-configured with the S3 fallback URI — Fleet Manager does not push this automatically. Each Supervisor needs to know where to fetch fallback configurations and have AWS credentials to access the bucket.

You can specify multiple fallback paths. The Supervisor tries them in order and uses the first one that succeeds.

Add the fallback URI to your Helm values under the fleetManagement.supervisor section:

opentelemetry-agent:
  presets:
    fleetManagement:
      enabled: true
      supervisor:
        initialFallbackConfigs:
          - "s3://YOUR_BUCKET_NAME.s3.YOUR_REGION.amazonaws.com/"

For AWS credentials, use IAM Roles for Service Accounts (IRSA) on EKS, or provide credentials as environment variables for other Kubernetes platforms.

Add the initial_fallback_configs field to your Supervisor configuration file (/etc/otelcol-coralogix/supervisor.yaml):

agent:
  executable: /usr/bin/cdot
  initial_fallback_configs:
    - "s3://YOUR_BUCKET_NAME.s3.YOUR_REGION.amazonaws.com/"

Provide AWS credentials through an EC2 instance profile or environment variables in the systemd unit file:

[Service]
Environment="AWS_ACCESS_KEY_ID=..."
Environment="AWS_SECRET_ACCESS_KEY=..."
Environment="AWS_REGION=YOUR_REGION"

Step 5: Verify the connection

After saving, confirm the bucket status shows a green Connected status indicator next to the bucket name in the Fleet Settings panel.

Apply a test configuration to verify that Fleet Manager syncs it to S3. Check the bucket contents to confirm the configuration object appears.

Change the S3 bucket

To switch to a different S3 bucket:

  1. In Coralogix, navigate to Integrations, then Fleet Management. Open the Configurations tab and select Settings.
  2. Update the bucket details and select Save.

Fleet Manager copies all active configurations from the previous bucket to the new one. Configurations continue to apply normally during migration. Data in the old bucket is not deleted.

Best practices

  • Dedicate a separate S3 bucket for fallback configurations. Do not share it with other workloads to avoid accidental deletion or policy conflicts.
  • Enable S3 versioning on the bucket to maintain a history of configuration changes and allow recovery from unintended overwrites.
  • Apply a lifecycle policy to expire old object versions after a retention period that aligns with your compliance requirements.
  • Restrict bucket access using least-privilege IAM policies. Grant Fleet Manager write access and the Supervisor read-only access.
  • Monitor S3 access using AWS CloudTrail to track who reads and writes fallback configurations.
  • Use the same AWS region as your primary Coralogix deployment to minimize sync latency.

Limitations

  • Amazon S3 is the only supported fallback storage backend. S3-compatible object stores are not currently supported.
  • Sync latency between Fleet Manager and S3 depends on your network and bucket region. Configurations are synced on each activation, not continuously.
  • Fallback retrieval relies on the Supervisor having network access to the S3 bucket. Agents in isolated networks that cannot reach S3 fall back to local last-known-good only.
  • When switching buckets, data in the old bucket is not automatically deleted.

Troubleshoot

S3 access denied when validating the bucket

  • Cause: The IAM role does not have the required permissions on the bucket, or the trust policy does not include the correct External ID.
  • Fix: Verify the IAM policy grants s3:PutObject and s3:ListBucket permissions on the bucket ARN for Fleet Manager, and s3:GetObject for the Supervisor. If you use an External ID, confirm it matches the value in Fleet Settings.

Configuration not synced to S3 after activation

  • Cause: The sync from Fleet Manager to S3 failed silently, or the bucket connection was interrupted.
  • Fix: Fleet Management displays a warning banner in the Configurations tab if a sync to S3 fails. Check the bucket connection in Fleet Settings to diagnose the issue. Re-validate the connection and reactivate the configuration.

Agent using stale fallback configuration

  • Cause: The agent retrieved an outdated configuration from S3 because the latest sync had not completed before Fleet Manager became unreachable.
  • Fix: Once Fleet Manager is available again, reactivate the configuration to trigger a fresh sync. The Supervisor automatically switches back to the remote configuration path when Fleet Manager recovers.

Supervisor not falling back to S3

  • Cause: The Supervisor cannot reach the S3 bucket due to network restrictions or missing IAM permissions.
  • Fix: Verify that the Supervisor's network allows outbound HTTPS access to the S3 endpoint. Confirm that the Supervisor's IAM credentials, provided through an EC2 instance profile, EKS IAM Roles for Service Accounts (IRSA), or environment variables depending on your deployment platform, grant s3:GetObject on the bucket.