Log Parsing Rules

What is log parsing?

Parsing Rules converts unstructured log data into structured key-value pairs based on user-defined rules. Log parsing enhances data usability, making it ideal for querying, analysis, and generating insights. Additionally, it can optimize costs—for example, by parsing an unnecessary complex JSON document as a simple string.

In Coralogix, log parsing is performed automatically using predefined parsing rules in Extension Packages, but it can also be customized and applied by the user to meet specific needs.

For detailed behavior, required fields, and examples for each rule type, see Rule Types.

In this guide, you'll learn the mechanics of log parsing and how to create your own custom rules. It covers:

How log parsing works
What a rule group is
What a rule is and the logical AND/OR relationship between rules
Why rule group and rule order matter
Steps to take before getting started with Parsing Rules
Creating and managing rule groups and rules
Parsing Rules optimization
Parsing Rules limits

How log parsing works

Parsing Rules in Coralogix uses rules to process, transform, and organize log data for effective monitoring and analysis. These rules enable you to:

Extract critical information from raw logs
Convert unstructured text into structured formats
Filter out irrelevant content
Mask sensitive fields for compliance
Fix formatting issues
Block logs with unwanted content

Ingested logs flow through a pipeline system, where rule groups are applied sequentially as logs move through the pipeline.

Scope

Rules are applied to all system logs, including high, medium, low-tier, and blocked logs. To learn more about how logs flow through the system, see TCO Optimizer.

Timing

Rules only impact ingested logs after the rule has been created.

Parsing example

This is a standard Heroku L/H type error log sent from Heroku, which contains unstructured text. In its unparsed form, you'd need to perform a full-text search to answer most questions. While it’s useful for searching, it doesn’t offer much structure for deeper analysis.

sock=client at=warning code=H27 desc="Client Request Interrupted" method=POST path="/submit/" host=myapp.herokuapp.com fwd=17.17.17.17 dyno=web.1 connect=1ms service=0ms status=499 bytes=0

RegEx:

^(sock=)?(?P<sock>(\\\\S*))\\\\s*at=(?P<severity>\\\\S*)\\\\s*code=(?P<error_code>\\\\S*)\\\\s*desc="(?P<desc>[^"]*)"\\\\s*method=(?P<method>\\\\S*)\\\\s*path="(?P<path>[^"]*)" host=(?P<host>\\\\S*)\\\\s* (request_id=)?(?P<request_id>\\\\S*)\\\\s*fwd="?(?P<fwd>[^"\\\\s]*)"?\\\\s*dyno=(?P<dyno>\\\\S*)\\\\s*connect=(?P<connect>\\\\d*)(ms)?\\\\s*service=(?P<service>\\\\d*)(ms)?\\\\s*status=(?P<status>\\\\d*)\\\\s* bytes=(?P<bytes>\\\\S*)\\\\s*(protocol=)?(?P<protocol>[^"\\\\s]*)$

After parsing a Regular Expression (RegEx), the resulting log is converted to JSON and organized into attributes, like severity and error_code:

{
  "sock": "client",
  "severity": "warning",
  "error_code": "H27",
  "desc": "Client Request Interrupted",
  "method": "POST",
  "path": "/submit/",
  "host": "myapp.herokuapp.com",
  "request_id": "",
  "fwd": "17.17.17.17",
  "dyno": "web.1",
  "connect": "1",
  "service": "0",
  "status": "499",
  "bytes": "0",
  "protocol": ""
}

The value of this parsing process is that it transforms unstructured log data into a machine-readable format, enabling more efficient querying, reporting, and automation.

Rule groups

Rule groups group parsing rules into collections designed for specific log types, enabling sequential transformations. They include the following rule types:

BLOCK: Prevent specific logs from being processed further. Optionally, mark them as low-priority for S3 archive queries, with no downstream parsing applied.
EXTRACT: Pull specific details from log content.
EXTRACT JSON: Extract fields from JSON-formatted logs.
PARSE: Parse fields using RegEx and replace the data in the log with the extracted fields.
PARSE JSON FIELD: Parse inner JSON strings into fields.
REMOVE FIELDS: Delete unwanted fields from logs.
REPLACE: Replace parts of the log content.
STRINGIFY JSON FIELD: Convert JSON fields into strings.
TIMESTAMP EXTRACT: Extract or modify log timestamps.

For a comprehensive explanation of each rule type, including detailed examples and use cases, see Rule Types.

Order of rule group execution

Logs are processed in ascending order of rule groups (from top to bottom), following the sequence in which they were created. Users can adjust this order by dragging and dropping rule groups within the list, allowing for customized log processing flow.

Rule group components

Each rule group contains the following components.
Section Description
Details A name and optional description for the rule group.
Rule Matcher [Optional] Fields that filter logs based on application, subsystem, and severity. Only logs matching the defined conditions will be processed, improving performance.
Rules A sequence of rules that apply transformations to the logs.

Rules

Rules are defined instructions that allow you to process, transform, and categorize log data in a structured way by using Regular Expressions (RegEx) to match specific patterns in the log text. Each rule operates within a rule group and can be connected with other rules using logical AND/OR relationships. Rules can be used to extract, modify, or classify log data to make it more useful for monitoring, filtering, and analysis.

PARSE example

This example demonstrates the configuration of a PARSE rule. The PARSE rule converts unstructured log text into structured data, typically in JSON format.

Source Field:
- The rule will examine a specific field from the incoming log (e.g., log.message). This is where the unstructured data resides before parsing. The rule will only be applied if the source field is present in the log data.
RegEx:
- The RegEx pattern is applied to the content of the source field. It is used to match specific parts of the log text. In this case, the RegEx will extract important details like method, status, path, and more. This enables you to parse the log and structure it into a more usable format.
Destination Field:
- After applying the RegEx, the parsed data is stored in a destination field. This ensures that the parsed information is separated from the original log text and can be used for further analysis.

By applying this PARSE rule, unstructured logs are transformed into structured data, making it easier to perform specific queries or analysis tasks. For instance, the parsed JSON data could include fields like status, method, host, and bytes, which allow for more precise querying and filtering.

REPLACE example

This example demonstrates how to use a REPLACE rule to standardize the classification of log severity by replacing specific log messages with a standardized severity level, without altering the original source field.

Source Field:
- The rule is applied to a specific field in the log data (e.g., log.message). The rule checks the content of this field to decide if a replacement is needed.
Regular Expression (RegEx):
- A RegEx pattern is used to identify parts of the log message. In this case, the pattern is designed to match any content, effectively applying the rule universally to all log messages.
Replacement String:
- Once the RegEx finds a match, it replaces the content of the source field with the specified replacement string (ERROR). This ensures that all matched log entries will display the standardized severity level.
Destination Field:
- The new, standardized severity value (ERROR) is stored in a destination field, such as severity. The original field (log.message) remains unchanged, preserving the original log message.
- Important Note: If the source field and destination field were the same, the original message would be overwritten with the replacement value. By using a different destination field, the source field’s content remains intact while the severity is tagged.

This REPLACE rule enables you to enforce consistent log classification by standardizing severity levels across various logs, making it easier to track and filter logs based on severity.

How a log passes through a rule sequence

Logs matching the Rule Matcher proceed through the group’s rules based on their AND/OR relationships:

AND: Logs pass through all connected rules sequentially.
OR: Logs are processed by the first matching rule, skipping subsequent rules.

In complex sequences, a log passes through the rules in sequence. The log passes through every rule that connects to another rule with AND and only passes through the first applicable rule of multiple rules connected by OR.

For example, in this rule group containing rules EXTRACT, REPLACE, and JSON:

RED subgroup: Rules (Rule-1 through Rule-5) are connected by OR. Logs pass through only the first applicable rule.
YELLOW subgroup: Connected to RED by AND. Logs pass through YELLOW rules sequentially, but only the first matching rule applies (since YELLOW’s rules use OR).

Note

A subgroup of rules separated by OR number the rules in ascending order. The following rule attached with an AND causes the numbering to restart.

Why rule order matters

The order of rules in a rule group is crucial because each rule processes and transforms the log as it moves through the pipeline. When one rule modifies a log, the next rule applies to the transformed version of the log, not the original input. This sequential transformation can significantly impact the final result.

As the log progresses through the pipeline:

Each rule applies to the latest version of the log.
Rules rely on the transformations applied by earlier rules.

This means that placing rules in the wrong order can cause some transformations to fail, resulting in incomplete or incorrect logs.

Example:

Rule-1
- Log Input:
```
{"message": "test-message", "key2": "98a35"}
```
- Action: Use RegEx to substitute the "message" field with "message_str".
- RegEx: Replace "message" with "message_str".
```
^message$
```
- Log Output:
```
{"message_str": "test", "key2": "98a35"}
```
Rule-2
- Log Input (from Rule-1's output):
```
{"message_str": "test", "key2": "98a35"}
```
- Action: Add a new field based on whether "message_str" exists.
- RegEx: If "message_str" exists, add a new field: "mess_str_exists": "true".
```
"message_str":\\s*".+"
```
- Log Output:
```
{"message_str": "test", "key2": "98a35", "mess_str_exists": "true"}
```

If the rules are out of order, for example, if Rule-2 runs before Rule-1, the transformation will fail because "message_str" wouldn’t exist in the log when Rule-2 is executed. The result will look like this:

Rule-2 runs first (and does nothing):

{"message": "test-message", "key2": "98a35"}

Rule-1 runs afterward:

{"message_str": "test", "key2": "98a35"}

Since Rule-2 was executed without the "message_str" field, the "mess_str_exists":

Option	Description
`.` (dot)	Matches any single character except for line breaks. Useful for matching various characters within a string.
`^`	Anchors the match to the start of a string. Can be used to ensure the pattern starts at the beginning of the log line.
`$`	Anchors the match to the end of a string. Useful when you want to match the end of a log line.
`[]`	Matches any one of the characters inside the brackets. For example, `[a-z]` matches any lowercase letter.
`[^]`	Matches any character except the ones inside the brackets. For example, `[^0-9]` matches any non-digit character.
`*`	Matches zero or more of the preceding element. For example, `a*` matches any number of 'a' characters, including none.
`+`	Matches one or more of the preceding element. For example, `a+` matches one or more 'a' characters.
`?`	Matches zero or one of the preceding element. For example, `a?` matches an optional 'a'.
`{n}`	Matches exactly `n` occurrences of the preceding element. For example, `a{3}` matches exactly three 'a' characters.
`{n,}`	Matches `n` or more occurrences of the preceding element

Field	Description
Name	Specify a unique name for the rule to identify it in your rule set.
Description	Provide an optional description of the rule's purpose or function. This helps others understand what the rule is designed to do.
Source Field	The log field or data source that the rule will analyze. For a PARSE rule, this is typically a raw, unstructured log entry that the rule will convert into structured data using regular expressions.
Destination Field	The field where the parsed data will be stored. This enables you to store the processed, structured data in a field that is easier to query and analyze.
RegEx	The regular expression (RegEx) used to match and extract data from the source field. This is critical for defining how the unstructured log data will be parsed.
1. Insert a log sample that lets you preview how the rule will process actual log entries. This is useful for testing the rule to ensure it works as expected before applying it to the full log data.

Log Parsing Rules

What is log parsing?

How log parsing works

Scope

Timing

Parsing example

Rule groups

Order of rule group execution

Rule group components

Rules

PARSE example

REPLACE example

How a log passes through a rule sequence

Why rule order matters

Supported RegEx options

Rule types

Before you begin

Set a query time frame

Log exploration

Handling multiline logs

Creating a rule group

Details

Rule matcher

Rules

Defining rules in a rule group

Managing rule groups

Adding a rule to an existing rule group

Searching for rule groups

Editing rules and groups

Optimizing Parsing Rules

Order your rules for efficiency

Avoid parsing nested objects

Swiftly create RegEx patterns

Create a `message` field for your logs

Extract additional values from newly parsed logs

Parse key metadata

Validation

Limits

API

Section	Description
Details	A name and optional description for the rule group.
Rule Matcher	[Optional] Fields that filter logs based on application, subsystem, and severity. Only logs matching the defined conditions will be processed, improving performance.
Rules	A sequence of rules that apply transformations to the logs.

Log Parsing Rules

What is log parsing?

How log parsing works

Scope

Timing

Parsing example

Rule groups

Order of rule group execution

Rule group components

Rules

PARSE example

REPLACE example

How a log passes through a rule sequence

Why rule order matters

Supported RegEx options

Rule types

Before you begin

Set a query time frame

Log exploration

Handling multiline logs

Creating a rule group

Details

Rule matcher

Rules

Defining rules in a rule group

Managing rule groups

Adding a rule to an existing rule group

Searching for rule groups

Editing rules and groups

Optimizing Parsing Rules

Order your rules for efficiency

Avoid parsing nested objects

Swiftly create RegEx patterns

Create a message field for your logs

Extract additional values from newly parsed logs

Parse key metadata

Validation

Limits

API

Create a `message` field for your logs