Handling PII and Sensitive Data
The ever-growing volume of telemetry data often contains a significant amount of sensitive information that can potentially identify, contact, or locate an individual, either on its own or when combined with other data. To ensure compliance with Coralogix's service terms and data privacy regulations, it is essential to actively utilize the tools provided here to anonymize personal data before sending it to the Coralogix platform. With the exception of essential business information, such as names and email addresses used for user identification and authentication, personal identifiable information (PII) should generally be removed from your logs, traces, and metrics prior to transmission outside of your environment.
Guidelines for managing sensitive data in Coralogix
Except for a few specific types of data, most of the information you send to Coralogix does not need to include private or personal data to unlock the full value of the service. Coralogix offers detailed instructions, tools, and best practices to help you scrub, obfuscate, filter, and minimize the inclusion of sensitive or personal data in the data you choose to share.
You can filter the data either at the OpenTelemetry Collector level or during ingestion by applying Coralogix's parsing rules. These rules allow you to block, mask, or modify the data before it is stored, ensuring sensitive information is handled appropriately.
In general, handling PII at the shipper level is considered best practice because it prevents sensitive data from ever leaving the source system unprotected and saves on costs.
Data filtering in the OpenTelemetry Collector
Customize your telemetry data to meet specific requirements by defining relevant rules in the OpenTelemetry Collector configuration file. The OpenTelemetry Collector provides four processors that can be used for data filtering.
- Attributes: Modifies and manages individual attributes within a span.
- Redaction: Filters sensitive attributes or masks their values.
- Transform: Utilizes the OpenTelemetry Transform Language to perform extensive transformations on telemetry data efficiently.
- Filter: Controls the flow of telemetry data based on specific conditions or attributes.
Add these processors to the OpenTelemetry Collector’s configuration file and enable the processor functionality by updating the relevant service or pipelines.
Note
The documentation for OpenTelemetry Collector's processors is subject to change and should be verified directly at the GitHub repository for the most up-to-date information.
Attributes processor
The attributes
processor is used to modify, insert, or delete attributes in spans, logs, or metrics before they are exported.
For example, this configuration:
- Hashes the
user.email
attribute (using SHA256). This means that the original email will be replaced with a hashed value, making it anonymized while still allowing you to track the same user across multiple traces. - Removes the
user.ssn
attribute, which may contain a Social Security Number. SSNs are sensitive and should always be protected, never logged or exposed. - Masks the
user.credit_card_number
attribute to prevent exposing the full card number. The value is replaced with****-****-****-****
to protect sensitive financial data while still allowing the number of digits to be visible.
processors:
attributes/update:
actions:
- key: "user.email"
action: "hash"
- key: "user.ssn"
action: "remove"
- key: "user.credit_card_number"
action: "mask"
value: "****-****-****-****"
Redaction processor
The redaction
processor removes sensitive data while retaing useful, non-sensitive information for debugging, monitoring, and performance analysis.
Remove
For example, the redaction
processor removes all attributes, except id
, region
, and timezone
.
Block
For example, the redaction
processor allows all attributes while blocking (removing from telemetry data) user.phone_number
and user.name
.
Transform processor
The transform
processor is used to modify or add new attributes to your data before it is exported. The transform processor uses the OpenTelemetry Transformation Language (OTTL) to rename attributes, add or remove tags, alter the data structure and much more.
General structure
The main elements of the transform
processor are illustrated below.
For example, this code applies a series of transformations to traces, metrics, and logs. It ignores errors encountered during the transformation process. The following transformations are applied:
Trace data (spans)
keep_keys(span.attributes, ["service.name", "service.namespace", "cloud.region", "process.command_line"])
. This transformation keeps only the specified attributes in the span’sattributes
(metadata associated with the span). The attributesservice.name
,service.namespace
,cloud.region
, andprocess.command_line
are retained, while all others are discarded.replace_pattern(span.attributes["process.command_line"], "password\\=[^\\s]*(\\s?)", "password=***")
. This operation uses a regex to find any pattern matchingpassword=<value>
in theprocess.command_line
attribute and replaces the password value withpassword=***
. This is commonly done to redact sensitive information like passwords.
Metrics
keep_keys(resource.attributes, ["host.name"])
. It retains only thehost.name
attribute in the resource's attributes, discarding all other attributes.
Logs
replace_all_matches(log.attributes, "/user/*/list/*", "/user/{userId}/list/{listId}")
. This operation replaces parts of log attributes matching the pattern/user/*/list/*
with the more generalized pattern/user/{userId}/list/{listId}
. This is often used to anonymize URLs or sensitive paths that contain user-specific information.replace_all_patterns(log.attributes, "value", "/account/\\d{4}", "/account/{accountId}")
. This transformation replaces all occurrences of the pattern/account/<4-digit-number>
in thelog.attributes
with/account/{accountId}
, essentially redacting specific account numbers.set(log.body, log.attributes["http.route"])
. This sets thelog.body
to the value of thehttp.route
attribute in the log's attributes. It replaces the original log body with the HTTP route, which is useful for understanding the endpoint involved in the logged event.
transform:
error_mode: ignore
trace_statements:
- keep_keys(span.attributes, ["service.name", "service.namespace", "cloud.region", "process.command_line"])
- replace_pattern(span.attributes["process.command_line"], "password\\=[^\\s]*(\\s?)", "password=***")
metric_statements:
- keep_keys(resource.attributes, ["host.name"])
log_statements:
- replace_all_matches(log.attributes, "/user/*/list/*", "/user/{userId}/list/{listId}")
- replace_all_patterns(log.attributes, "value", "/account/\\d{4}", "/account/{accountId}")
- set(log.body, log.attributes["http.route"])
Filter processor
The filter
processor uses the OpenTelemetry Transformation Language to create conditions that determine when telemetry should be dropped. If any condition is met, the telemetry is dropped (each condition is ORed together). Each configuration option corresponds to a specific type of telemetry and its associated OTTL context.
For example, this filter
processor will look at each trace span and check if the http.request.method
attribute is missing or null.
- The
error_mode: ignore
setting ensures that if there is an issue evaluating this condition for any span (e.g., the attribute doesn’t exist or is malformed), the processor will ignore the error and continue processing other spans in the pipeline. - If the span contains a
http.request.method
attribute with a non-null value (e.g., "GET", "POST"), the span will not be filtered out and will pass through the pipeline. - If the span has no
http.request.method
attribute or if it is explicitly null, then the span matches the condition and will be excluded from further processing.
Post-ingestion data filtering
Once telemetry data is ingested into Coralogix, parsing rules can be applied to identify, redact or transform PII prior to its storage. While handling PII at the shipper level is recommended, Coralogix’s parsing engine enables centralized processing for consistent management of sensitive data across all ingested telemetry, regardless of its source or type.
Use the following parsing rules to handle PII:
- Block. Use regex to identify sensitive data matching specific patterns. Logs containing data that matches these patterns will be blocked at ingress, preventing them from proceeding further through the Coralogix analytics pipeline.
Note
The block
rule prevents access to the entire log, rather than only blocking the sensitive field or data.
- Replace. Similar to block rules, use regex to search for sensitive data patterns. The replace rule can be applied like shipper filters to mask sensitive data, replacing it with a string or value of your choice. For example, recurring "X"s (XXXX-XXXX-XXXX-XXXX) are commonly used to mask sensitive data.
- Remove Fields. If you expect sensitive information to be shipped in specific fields with known names, you can remove those fields from the logs when their names are detected.
Coralogix’s parsing pipelines allow you to combine these rules in creative ways or use them individually. The scope of these rules can be restricted to application and subsystem metadata fields, which are added to the data by the shipper when it is sent to Coralogix.
For example, to mask the apiKey
value in the uri
string using the replace
rule:
- Define the source field as
text.uri
and set the destination field to the same value (text.uri
). - Use the regex
apiKey=[^&]+
to match theapiKey
pattern. - Replace the
apiKey
value withapiKey=REDACTED
.
The resulting log entry will look as follows:
{
"timestamp": "2025-03-20T12:34:56Z",
"level": "INFO",
"message": "Incoming request",
"uri": "https://api.example.com/resource?apiKey=REDACTED&userId=42&filter=active",
"status_code": 200,
"response_time_ms": 123
}