Back

RegEx 101: Guide for Managing Log Data in Coralogix

Coralogix Team Dec 18, 2019

16 mins read

You may be wondering, what are RegEx expressions? And how do they work when it comes to managing log data in Coralogix? We’ve got you covered.

Regular expressions can be crucial for wrangling log data efficiently. You may want to extract specific data from your logs to make it easier to analyze and visualize. Sometimes you might want to capture an email when a particular message is logged. Other times, you may need to hide sensitive data in logs before they are saved.

And more often than not, you need to match using a RegEx pattern rather than an exact text search.

This RegEx guide is split into three parts. We’ll start with a brief intro on understanding RegEx, while the second part shows RegEx methods and how you can use them in a logging context. Finally, we end with a complete RegEx 101 table for your quick reference

What is RegEx and how does it work?

Regular Expressions, also known as RegEx or RegExp, is a domain-specific language (DSL) used for pattern searches and replacements.

There are times when you may not need a regular expression; you can just search for a specific text. For example, if you just want to find log lines with the text “user logged in”, you can simply enter this text into your log search tool. But if your log lines look like this: “user_32 logged in” you can’t search for the exact text anymore since the user id is a variable that changes.

Fortunately, there is a RegEx pattern for this case which makes use of the match anything sequence:

user_d+ logged in

How to use RegEx to manage log data

In this section, we list all places where RegEx may be used in Coralogix and give you some ready-made examples which you can start using immediately.

Extracting text from logs

One of the most wanted things is extracting data from logs. For example, this way, you can convert unstructured logs into structured logs. For this, you’ll need to use the Extract Rule.

Extracting text into Custom JSON Fields

Suppose you have an unstructured log of the following form:

`${logLevel}: World-${worldName}: ${logText}`

and you would like to convert all entries of this format into a JSON object of the following form:

{
    "level": `${log-level}`,
    "tag": `World-${worldName}`,
    "text": `${logText}`
}

The Extract Rule will extract all text matched by capturing groups and convert it into a JSON object. Names of the capturing groups will become JSON keys.

For example:

^(?P.*?):s*(?P.*?):

Some explanations:

^ is the start of line symbol, it means that match should start with the start of a line
(?PX), where X is the matcher, is the named capturing group syntax. We have three capturing groups here, one for each JSON key we need.
s means “any whitespace character“
. means “any one character”
* means “0 or more matches of the previous symbol/character”
.* means “1 or more matches of the previous symbol/character”
.*? means “any characters any number of times, with the least amount of tokens necessary“

Thus, the RegEx above means:

match any text starting with the start of a line and until the first : symbol as capturing group "level"
then, after any number of whitespaces, match any text until the next : symbol as capturing group "tag"

The whole log text is automatically set to the "text" field by Coralogix, so we will have the "text" field automatically.

Above RegEx will convert the following log:

"info: World-w-8: generate: new world"

into this JSON object:

{
   "level": "info",
   "tag": "World-w-8",
   "text": "info: World-w-8: generate: new world"
}

Extracting text into Predefined Coralogix Fields

But you don’t always want to extract text into custom JSON fields. Sometimes you need to assign matched text to predefined Coralogix fields. In the example above:

"info: World-w-8: generate: new world"

You might want to extract the "info" text into the Severity column, and the "World" text into the Class column.
You only need to set correct names of the capturing groups for that:

^(?P.*?):s*(?P.*?)-(?P.*?):

This RegEx will format the above message in the following way:

You can set other predefined columns like this, such as Category and Method. You need to set the keys "category" and "methodName" for this. Here is a RegEx mapping these predefined fields:

^(?P.*?):s*(?P.*?)-(?P.*?):s*(?P.*?)[$:]

And here is the result:

Extracting specific data from Logs

We already saw how you can extract data from unstructured logs into custom JSON fields. The same trick can be used for the structured logs too. Following is an example showing you how to extract data from a specific JSON filed.

Suppose you have a structured log line like this:

{
    "type": `${text}`,
    "log": `${text}`,
    "region": "rg-europe-2"
}

Now, knowing that the "region" field has the following form:

`rg-${"europe"|"asia"|"na"}-${number}`

We want to extract the part which tells us whether the region is "europe", "asia" or "na". The RegEx should be like this:

"region"s*:s*"rg-(?P.*?)-

Note the named capturing group "regionName" here – this is what actually extracts text. We place it after the key name "region" and the characters "rg-": right where we expect it to be according to our format. The purpose of the s* symbols is to make RegEx still work if there are any whitespaces before or after the : symbol.

Here is the result of applying this rule:

Replacing and removing values

One of the most common examples where we need to replace or remove values is hiding personal data. Suppose you log phone numbers somewhere and don’t want those to be saved in Coralogix.

We should start with creating a new Rule, as described above. Once we select the Replace rule in the rule type selection section, we will see the following screen:

Suppose we have an unstructured log line like this:

"info: Sender: sendSms: sending sms to phone number +12345678910 to user Andrew"

You want to remove phone number and name from this line. This RegEx will match the line above, starting with "sending sms":

sending sms to phone number +*d+ to user .*

Note that we need to escape the + symbol with because it has a special meaning in the RegEx syntax. This meaning is “1 or more previous characters”. Symbol d matches any single digit. Remember, that the * symbol means “0 or more previous characters”. Thus, +*d+ matches one or more digits which may be prepended by a + symbol (or not).

This RegEx will replace the text matched by the previous RegEx to the same text but without the phone number and name:

sending sms to phone number * to user *

And here is the result of applying the above rule:

Replacing JSON Values in structured logs

Replacing JSON values in structured logs is similar. The whole JSON string is used as the input for the Replace rule.

Let’s remember our structured log line from above:

{
   "type": `${text}`,
   "log": `${text}`,
   "region": "rg-europe-2"
}

Suppose you need to replace the value of the "type" field with any other value you want. Here is how you can match the value of the "type" field:

"type"s*:s*".*?"

Remember that we need the s* symbols to make sure the RegEx works if there are whitespaces before or after the : character.

And here is the RegEx to replace any value of the "type" field with "newType":

"type":"newType"

Using backreferences

Suppose we need to replace “europe” to “eu” in the strings of format “west-europe-2”. And suppose we need to do this not only in the “region” field but in any other part of the log where we meet it. Matching this pattern is rather easy:

.+?-europe-d+

But replacing it using the methods we used before might be rather hard. This is because we need to insert two strings before and after the “europe” text, which might vary. To do this, we first need to capture this text into capturing groups:

(.+?)-europe-(d+)

Remember that the d symbol means “any digit” and together with + it means “any digit one or more times”. In a similar way, .+ means any symbol one or more times, but with the fewest amount of tokens to make the match.

This will capture the text before “europe” as capturing group 1, and the text after “europe” as capturing group 2. The following RegEx will use backreferences to insert matched content of those groups:

$1-eu-$2

Logs before applying our rule:

Logs after applying our rule:

Searching for logs

Another place where you can use RegEx is the Logs search bar:

You can enter the exact text you want to find there, without any RegEx. But if you look for a particular pattern, you need to use RegEx. RegEx queries have their own form, which is:

/${fieldName}.keyword:/REGEX//

Suppose we have many JSON-structured logs of the following form:

{    
    "log":  `${text}` ,
    "regionName":  `${text}`,
    "region":  `${text}`, 
    "type"  :  `ltest-w-${number}`
}

And we want to match only those entries where "type" is equal to “ltest-w-1”, “ltest-w-2” or “ltest-w-3”. The following search query will do this:

/type.keyword:/ltest-w-[1-3]//

Note: the text between square brackets [ and ], is called a character class. It matches any one character listed between square brackets. Dash may be used as a shorthand to list several characters: [1-5] is the same as [12345]

You can use the same syntax to match data in unstrucutred logs with RegEx. You only need to set field name to text.keyword. For example:

 /text.keyword:/.*ltest-w-[1-3].*//

Above RegEx will search for text “ltest-w-1”, “ltest-w-2” or “ltest-w-3” in main log body.

Similar RegEx syntax is used in the Kibana search bar. Click Kibana and then Discover to see this search input:

The only difference here is that you don’t need the / symbols in the beginning and at the end. Thus, our search query above will turn into:

type.keyword:/ltest-w-[1-3]/

Note: sometimes you don’t see any logs after clicking Discover. It’s possible that there are no logs for the time interval set by default, which is 15 minutes. Try clicking the time interval selector and increasing it.

Triggering alerts with RegEx

Another popular use case for RegEx in Coralogix is Alerts. Alerts syntax is the same as logs query syntax. Let’s start with creating an alert. First, click the Alerts tab, and then click the Plus button on the right:

Let’s say we want to alert us on a line of the following form:

`App: init: World-${name}: generation error: ${err}`

And suppose we want to get alerts only for worlds "w-1", "w-2", "w-3" or "w-4". Our alert RegEx will look like this:

/text.keyword:/.*World-w-[1-4]: generation error.*//

Remember, that [1-4] matches any single character from 1 to 4, and .* matches any characters any number of times.

In the “Rules definitions” section select “Notify immediately” to get alert on the first match. Or you can select More and then choose the number of matches needed to trigger the alert.

In the Notifications definition section, enter email addresses where a message should be sent on alert.

When an alert is triggered, you will:

Get an email
Will see it in the dashboard
Will see it in the alerts section

Summary

I hope you found this tutorial interesting and helpful. You can start with Regular Expressions by copy-pasting the examples above and substituting parts of them when you need it. You will soon find that knowing RegEx makes your life easier not just with Coralogix, but with many other products.

Characters
Symbol(s)	Explanation	Example	Match	No Match
`<strong>.</strong>`	Matches any single character	`m.n`	`manmen`	`moonnon`
`<strong>d</strong>`	Matches any single digit	`user_dd`	`user_12` `user_30`	`user_123 (partial match)` `user_A2`
`<strong>D</strong>`	Not a digit. Matches any symbol except for digit.	`user_Dd`	`user_A2` `user_B0`	`user_22` `user_2A"`
`<strong>w</strong>`	Word character. Matches any letter, digit or underscore.	`userww`	`user_3` `user21` `userP2`	`usr_21` `user_324 (partial match)`
`<strong>W</strong>`	Not a word character. Matches any character which is not a letter, digit or underscore.	`userWw`	`user:3` `user-2` `user:A`	`user_2` `user32` `userA2`
`<strong>s</strong>`	Matches a whitespace character: space, tab, newline, carriage return	`{s"userName"s<a href="#quantifiers"></a>:s<a href="#quantifiers"></a>"user1"s<a href="#quantifiers"></a>}`	`{"userName":"user1"}` `{ "userName": "user1"}` `{ "userName": "user1" }` `{<br>"userName": "user1"<br>}`	`<br>{"userName":"user3"}<br>`
`<strong>S</strong>`	Not a whitespace character. Matches any character except for whitespace characters: space, tab, newline or carriage return	`"userName"S"user1"`	`"userName":"user1"` `"userName"A"user1"` `"userName","user1"`	`"userName" "user1"` `"userName": "user1"`

Quantifiers
Symbol(s)	Explanation	Example	Match	No Match
`<strong>*</strong>`	Zero or more matches of the previous character or RegEx sequence	`user_1*`	`user_` `user_1111` `user_1`	`user1` `user_21 (partial match)` `user_3 (partial match)`
`<strong>+</strong>`	One or more matches of the previous character or RegEx sequence	`user_1+`	`user_1` `user_1111`	`user_` `user_21` `user_3`
`<strong>?</strong>`	One or zero matches of the previous character or RegEx sequence. Also see greedy vs. non-greedy quantifiers	`user_1?`	`user_` `user_1`	`user_2 (partial match)` `user_11 (partial match)`
`<strong>{X}</strong>`	Exactly X matches of the previous character	`user_1{2}`	`user_11`	`user_` `user_21` `user_1` `user_111 (partial match)`
`<strong>{X,Y}</strong>`	Between X and Y matches of the previous character (inclusive)	`user_1{2,3}`	`user_11` `user_111`	`user_21` `user_1` `user_1111 (partial match)`
`<strong>{X,}</strong>`	X or more matches of the previous character	`user_1{2,}`	`user_11` `user_111` `user_1111`	`user_` `user_21` `user_1`

Character Classes
Symbol(s)	Explanation	Example	Match	No Match
`<strong>[abcd...]</strong>`	Matches any one character of those listed inside the square brackets	`user_[a-z]+`	`user_aikei` `user_roma` `user_hokfer`	`user_10` `user_a23b (partial match)`
`<strong>[<em>X-Y]</em></strong>`	Matches any one character in the range between X and Y.	`user_[2-4]`	`user_2` `user_3` `user_4`	`user_` `user_1` `user_5`
`<strong>[X-Yabcd...]</strong>`	You can specify multiple ranges and sets in the same character class	`user_[2-4a-cgj]`	`user_2` `user_a` `user_b`	`user_` `user_1` `user_d`
`<strong>[^...]</strong>`	Matches any one character which is NOT in the class. The ^ symbol negates the character class.	`user_[^0-9]`	`user_A` `user_BC` `user_hello`	`user_1` `user_12` `user_500`
`<strong>[x<em>N</em>]</strong>`	Matches a character at a specific unicode hexadecimal codepoint N	`user_[x48]`(x48 is the character “H”)	`user_H`	`user_a` `user_1` `user_2`

Character Classes may be used with any Quantifiers:
Symbol(s)	Explanation	Example	Match	No Match
`<strong>[...]+</strong>`	Matches one or more characters of those listed inside the square brackets	`user_[4-9]+`	`user_4` `user_56` `user_678`	`user_` `user_1` `user_40 (partial match)`
`<strong>[...]?</strong>`	Matches one or zero characters of those listed inside the square brackets	`user_[4-9]?`	`user_4` `user_`	`user_3 (partial match)` `user_44 (partial match)`

Anchors and Boundaries
Symbol(s)	Explanation	Example	Match	No Match
`<strong>^</strong>`	Matches start of a string by default or start of a line if multiline mode is on. Note that this symbol does not work when	`^info: user.*logged in`	`info: user_20 logged in`	`_info: user_20 logged in`
`<strong>$</strong>`	Matches end of a string by default or end of a line if multiline mode is on. Note that this symbol does not work when searching for logs using Logs Query or Kibana	`^info: user.*logged in$`	`info: user_20 logged in`	`info: user_20 logged in now`
`<strong>b</strong>`	Word boundary. Matches a point between two characters, one of which is a word character, and the other one is not a word character	`user_.*bin`	`user_20 logged in`	`user_20 loggedin` `user_20 logged_in`

Groups and Backreferences
Symbol(s)	Explanation	Example	Match	No Match
`<strong>(...)</strong>`	Matches any RegEx specified inside the parentheses and captures the match as a capturing group	`(user_)(.+?)b`	`user_28` The text “user_” will be captured as the capturing group 1 and the text “28” will be captured as the capturing group 2. Most useful for substitutions	`user_`
`<strong>(...\|...)</strong>`	Matches either RegEx on the left or on the right of the \| symbol and captures the match as a capturing group.	`user_(11\|22)`	`user_11` `user_22` “11” or “22” will be captured as the capturing group 1.	`user_12` `user_23`
`<strong>(?:...)</strong>`	Matches any RegEx specified inside the braces, but does not capture it as a capturing group.	`user_(?:11\|22)`	`user_11` `user_22` Neither “11” nor “22” will be captured as a capturing group.	`user_12` `user_23`
`<strong>(?P<NAME>...)</strong>`	Captures a portion of a match as a capturing group with name NAME	`(user_)(?P.+?)b`	`user_28` The text “user_” will be captured as the capturing group 1 and the text “28” will be captured as the capturing group named “userId”. This is useful for extracting values from logs	`user_`
`<strong>NUMBER</strong>`	Inserts contents of a matched group NUMBER into the search input. Note that you need to use $ instead of if you want to insert the group contents into the replace input instead. See next item and also “Replacing and Removing Values” above	`(user_.?) logged in, 1 name is.`	`user_15 logged in, user_15 name is John`	`user_15 logged in, user_14 logged in`
`<strong>$ NUMBER</strong>`	Inserts contents of a matched group with number NUMBER into the replace input. Note that you need to use instead of $ if you want to insert the group contents into the replace input instead. See the previous item and also “Replacing and Removing Values” above	Search input:`(user_<a href="#greedy">.?</a>) name is .+?<a href="#anchors-boundaries">b</a>` Replace input:`$1 name is `	Match:`user_15 name is John` After replace:`user_15 name is *`

Greedy vs. Lazy Quantifiers
Symbol(s)	Explanation	Example	Match	No Match
`<strong>*</strong>` `<strong>+</strong>` `<strong>{X,}</strong>` `<strong>{X,Y}</strong>`	All these quantifiers are “greedy” by default, meaning that they will match as many characters as possible	`user_.*<a href="#anchors-boundaries">b</a>`	`user_15 logged in, user_16 logged out` Everything until the last word boundary is matched, which is at the end of the string	`usr_15 logged in, usr_16 logged out`
`<strong>*?</strong>` `<strong>+?</strong>` `<strong>{X,}?</strong>` `<strong>{X,Y}?P</strong>`	The ? symbol after the quantifier makes it Lazy, meaning that it will match as less characters as possible.	`user_.*?<a href="#anchors-boundaries">b</a>`	`user_15 logged in, user_16 logged out` Everything until the first word boundary is matched: the text in blue.	`usr_15 logged in, usr_16 logged out`

Inline Modifiers
Note that inline modifiers only work for Coralogix Rules, but is not supported when searching for logs using Logs Query or Kibana
Symbol(s)	Explanation	Example	Match	No Match
`<strong>(?i)</strong>`	Ignore case.	`(?i)user_.*`	`user_15` `User_15` `USER_15`	`usr_15` `USR_16`
`<strong>(?s)</strong>`	Dotall mode. Makes the . symbol match newline characters.	`(?s)user_.*logged out`	Suppose you have a single log entry like this: `<br>user_15 logged in<br>user_15 created a document new_document<br>user_15 logged out<br>` Above regex will match all these three lines together as a single match.	The following regex `<br>user_.*logged out<br>` will not match anything in the example on the left
`<strong>(?m)</strong>`	Multiline mode. Makes the ^ and $ symbols match start and end of line respectively, instead of start and end of string.	`(?m)^user_.*$`	Suppose you have a single log entry like this: `<br>user_15 logged in<br>user_15 created a document new_document<br>user_15 logged out<br>` Above regex will match any of these lines as a separate match	This RegEx will not match all of these lines together as a single match. But the following RegEx will: `^user_.*$`

About the author
Andrei Chernikov is a Full-Stack Developer and also writes helpful guides on Medium.

(This article was updated in August 2023)

On this page

RegEx 101: Guide for Managing Log Data in Coralogix

What is RegEx and how does it work?

How to use RegEx to manage log data

Extracting text from logs

Extracting text into Custom JSON Fields

Extracting text into Predefined Coralogix Fields

Extracting specific data from Logs

Replacing and removing values

Replacing JSON Values in structured logs

Using backreferences

Searching for logs

Triggering alerts with RegEx

Summary

Related articles

Explore for Spans: One View with Infinite Depth

Managing OpenTelemetry at Scale: Why OTel Pipelines Need a Control Plane

Introducing the Coralogix CLI: Headless Observability for Every Agent

Be Our Partner

Thank You

Download our logo in high resolution