Skip to content

Handling PII and Sensitive Data

The ever-growing volume of telemetry data often contains a significant amount of sensitive information that can potentially identify, contact, or locate an individual, either on its own or when combined with other data. To ensure compliance with Coralogix's service terms and data privacy regulations, it is essential to actively utilize the tools provided here to anonymize personal data before sending it to the Coralogix platform. With the exception of essential business information, such as names and email addresses used for user identification and authentication, personal identifiable information (PII) should generally be removed from your logs, traces, and metrics prior to transmission outside of your environment.

Guidelines for managing sensitive data in Coralogix

Except for a few specific types of data, most of the information you send to Coralogix does not need to include private or personal data to unlock the full value of the service. Coralogix offers detailed instructions, tools, and best practices to help you scrub, obfuscate, filter, and minimize the inclusion of sensitive or personal data in the data you choose to share.

You can filter the data either at the OpenTelemetry Collector level or during ingestion by applying Coralogix's parsing rules. These rules allow you to block, mask, or modify the data before it is stored, ensuring sensitive information is handled appropriately.

In general, handling PII at the shipper level is considered best practice because it prevents sensitive data from ever leaving the source system unprotected and saves on costs.

Data filtering in the OpenTelemetry Collector

Customize your telemetry data to meet specific requirements by defining relevant rules in the OpenTelemetry Collector configuration file. The OpenTelemetry Collector provides four processors that can be used for data filtering.

  • Attributes: Modifies and manages individual attributes within a span.
  • Redaction: Filters sensitive attributes or masks their values.
  • Transform: Utilizes the OpenTelemetry Transform Language to perform extensive transformations on telemetry data efficiently.
  • Filter: Controls the flow of telemetry data based on specific conditions or attributes.

Add these processors to the OpenTelemetry Collector’s configuration file and enable the processor functionality by updating the relevant service or pipelines.

Note

The documentation for OpenTelemetry Collector's processors is subject to change and should be verified directly at the GitHub repository for the most up-to-date information.

Attributes processor

The attributes processor is used to modify, insert, or delete attributes in spans, logs, or metrics before they are exported.

For example, this configuration:

  • Hashes the user.email attribute (using SHA256). This means that the original email will be replaced with a hashed value, making it anonymized while still allowing you to track the same user across multiple traces.
  • Removes the user.ssn attribute, which may contain a Social Security Number. SSNs are sensitive and should always be protected, never logged or exposed.
  • Masks the user.credit_card_number attribute to prevent exposing the full card number. The value is replaced with ****-****-****-**** to protect sensitive financial data while still allowing the number of digits to be visible.
processors:
  attributes/update:
    actions:
      - key: "user.email"
        action: "hash"
      - key: "user.ssn"
        action: "remove"
      - key: "user.credit_card_number"
        action: "mask"
        value: "****-****-****-****"

Redaction processor

The redaction processor removes sensitive data while retaing useful, non-sensitive information for debugging, monitoring, and performance analysis.

Remove

For example, the redaction processor removes all attributes, except idregion, and timezone.

processors:
  redaction/update:
    allow_all_keys:false
    allowed_keys:
      - id
      - region
      - timezone

Block

For example, the redaction processor allows all attributes while blocking (removing from telemetry data) user.phone_number and user.name.

processors:
  redaction/update:
    allow_all_keys:true
    blocked_keys:
      - "user.phone_number"
      - "user.name"

Transform processor

The transform processor is used to modify or add new attributes to your data before it is exported. The transform processor uses the OpenTelemetry Transformation Language (OTTL) to rename attributes, add or remove tags, alter the data structure and much more.

General structure

The main elements of the transform processor are illustrated below.

transform:
  error_mode: ignore
  <trace|metric|log>_statements:
    - string
    - string
    - string

For example, this code applies a series of transformations to traces, metrics, and logs. It ignores errors encountered during the transformation process. The following transformations are applied:

Trace data (spans)

  • keep_keys(span.attributes, ["service.name", "service.namespace", "cloud.region", "process.command_line"]). This transformation keeps only the specified attributes in the span’s attributes (metadata associated with the span). The attributes service.nameservice.namespacecloud.region, and process.command_line are retained, while all others are discarded.
  • replace_pattern(span.attributes["process.command_line"], "password\\=[^\\s]*(\\s?)", "password=***"). This operation uses a regex to find any pattern matching password=<value> in the process.command_line attribute and replaces the password value with password=***. This is commonly done to redact sensitive information like passwords.

Metrics

  • keep_keys(resource.attributes, ["host.name"]). It retains only the host.name attribute in the resource's attributes, discarding all other attributes.

Logs

  • replace_all_matches(log.attributes, "/user/*/list/*", "/user/{userId}/list/{listId}"). This operation replaces parts of log attributes matching the pattern /user/*/list/* with the more generalized pattern /user/{userId}/list/{listId}. This is often used to anonymize URLs or sensitive paths that contain user-specific information.
  • replace_all_patterns(log.attributes, "value", "/account/\\d{4}", "/account/{accountId}"). This transformation replaces all occurrences of the pattern /account/<4-digit-number> in the log.attributes with /account/{accountId}, essentially redacting specific account numbers.
  • set(log.body, log.attributes["http.route"]). This sets the log.body to the value of the http.route attribute in the log's attributes. It replaces the original log body with the HTTP route, which is useful for understanding the endpoint involved in the logged event.
transform:
  error_mode: ignore
  trace_statements:
    - keep_keys(span.attributes, ["service.name", "service.namespace", "cloud.region", "process.command_line"])
    - replace_pattern(span.attributes["process.command_line"], "password\\=[^\\s]*(\\s?)", "password=***")
  metric_statements:
    - keep_keys(resource.attributes, ["host.name"])
  log_statements:
    - replace_all_matches(log.attributes, "/user/*/list/*", "/user/{userId}/list/{listId}")
    - replace_all_patterns(log.attributes, "value", "/account/\\d{4}", "/account/{accountId}")
    - set(log.body, log.attributes["http.route"])

Filter processor

The filter processor uses the OpenTelemetry Transformation Language to create conditions that determine when telemetry should be dropped. If any condition is met, the telemetry is dropped (each condition is ORed together). Each configuration option corresponds to a specific type of telemetry and its associated OTTL context.

For example, this filter processor will look at each trace span and check if the http.request.method attribute is missing or null.

  • The error_mode: ignore setting ensures that if there is an issue evaluating this condition for any span (e.g., the attribute doesn’t exist or is malformed), the processor will ignore the error and continue processing other spans in the pipeline.
  • If the span contains a http.request.method attribute with a non-null value (e.g., "GET", "POST"), the span will not be filtered out and will pass through the pipeline.
  • If the span has no http.request.method attribute or if it is explicitly null, then the span matches the condition and will be excluded from further processing.
processors:
  filter:
    error_mode: ignore
    traces:
      span:
        - attributes["http.request.method"] == nil

Post-ingestion data filtering

Once telemetry data is ingested into Coralogix, parsing rules can be applied to identify, redact or transform PII prior to its storage. While handling PII at the shipper level is recommended, Coralogix’s parsing engine enables centralized processing for consistent management of sensitive data across all ingested telemetry, regardless of its source or type.

Use the following parsing rules to handle PII:

  • Block. Use regex to identify sensitive data matching specific patterns. Logs containing data that matches these patterns will be blocked at ingress, preventing them from proceeding further through the Coralogix analytics pipeline.

Note

The block rule prevents access to the entire log, rather than only blocking the sensitive field or data.

  • Replace. Similar to block rules, use regex to search for sensitive data patterns. The replace rule can be applied like shipper filters to mask sensitive data, replacing it with a string or value of your choice. For example, recurring "X"s (XXXX-XXXX-XXXX-XXXX) are commonly used to mask sensitive data.
  • Remove Fields. If you expect sensitive information to be shipped in specific fields with known names, you can remove those fields from the logs when their names are detected.

Coralogix’s parsing pipelines allow you to combine these rules in creative ways or use them individually. The scope of these rules can be restricted to application and subsystem metadata fields, which are added to the data by the shipper when it is sent to Coralogix.

For example, to mask the apiKey value in the uri string using the replace rule:

  1. Define the source field as text.uri and set the destination field to the same value (text.uri).
  2. Use the regex apiKey=[^&]+ to match the apiKey pattern.
  3. Replace the apiKey value with apiKey=REDACTED.

The resulting log entry will look as follows:

{
  "timestamp": "2025-03-20T12:34:56Z",
  "level": "INFO",
  "message": "Incoming request",
  "uri": "https://api.example.com/resource?apiKey=REDACTED&userId=42&filter=active",
  "status_code": 200,
  "response_time_ms": 123
}

Additional resources

Introduction to Parsing Rules