Log Analytics 2019 - Coralogix partners with IDC Research to uncover the latest requirements by leading companies

FREE REPORT

Guide: RegEx 101 for Managing Log Data

Guide-RegEx-101

Regular expressions can be crucial for wrangling log data efficiently. You may want to extract specific data from your logs to make it easier to analyze and visualize. Sometimes you might want to capture an email when a particular message is logged. Other times, you may find yourself needing to hide sensitive data in logs before they are saved.

And more often than not, you need to match using a RegEx pattern rather than an exact text search.

This guide is split into three parts. We’ll start with a brief intro on RegEx, while the second part shows how you can use them in a logging context. Finally, we end with a complete RegEx 101 table for your quick reference

 

I. Introduction to RegEx

What Is RegEx

Regular Expressions, also known as RegEx or RegExp, is a domain-specific language (DSL) used for pattern searches and replacements.

There are times when you may not need a regular expression; you can just search for a specific text. For example, if you just want to find log lines with the text “user logged in”, you can simply enter this text into your log search tool. But if your log lines look like this: “user_32 logged in” you can’t search for the exact text anymore, since the user id is a variable that changes.

Fortunately, there is a RegEx pattern for this case which makes use of the match anything sequence:

user_\d+ logged in

 

II. Using RegEx to manage log data

In this section, we list all places where RegEx may be used in Coralogix, and give you some ready-made examples which you can start using immediately.

Extracting Text from Logs

One of the most wanted things is extracting data from logs. For example, this way you can convert unstructured logs into structured logs. For this, you’ll need to use the Extract Rule.

Extracting Text into Custom JSON Fields

Suppose you have an unstructured log of the following form:

`${logLevel}: World-${worldName}: ${logText}`

and you would like to convert all entries of this format into a JSON object of the following form:

{
    "level": `${log-level}`,
    "tag": `World-${worldName}`,
    "text": `${logText}`
}

The Extract Rule will extract all text matched by capturing groups and convert it into a JSON object. Names of the capturing groups will become JSON keys.

For example:

^(?P.*?):\s*(?P.*?):

Some explanations:

Thus, the RegEx above means:

  • match any text starting with the start of a line and until the first : symbol as capturing group "level"
  • then, after any number of whitespaces, match any text until the next : symbol as capturing group "tag"

The whole log text is automatically set to the "text" field by Coralogix, so we will have the "text" field automatically.

Extract JSON Editor

Above RegEx will convert the following log:

"info: World-w-8: generate: new world"

into this JSON object:

{
   "level": "info",
   "tag": "World-w-8",
   "text": "info: World-w-8: generate: new world"
}

Extracting Text into Predefined Coralogix Fields

But you don’t always want to extract text into custom JSON fields. Sometimes you need to assign matched text to predefined Coralogix fields. In the example above:

"info: World-w-8: generate: new world"

You might want to extract the "info" text into the Severity column, and the "World" text into the Class column.
You only need to set correct names of the capturing groups for that:

^(?P.*?):\s*(?P.*?)-(?P.*?):

This RegEx will format the above message in the following way:

Message format result

You can set other predefined columns like this, such as Category and Method. You need to set the keys "category" and "methodName" for this. Here is a RegEx mapping these predefined fields:

^(?P.*?):\s*(?P.*?)-(?P.*?):\s*(?P.*?)[$:]

Extract JSON Editor

And here is the result:

Message format result

 

 

Extracting Specific Data from Logs

We already saw how you can extract data from unstructured logs into custom JSON fields. The same trick can be used for the structured logs too. Following is an example showing you how to extract data from a specific JSON filed.

Suppose you have a structured log line like this:

{
    "type": `${text}`,
    "log": `${text}`,
    "region": "rg-europe-2"
}

Now, knowing that the "region" field has the following form:

`rg-${"europe"|"asia"|"na"}-${number}`

We want to extract the part which tells us whether the region is "europe", "asia" or "na". The RegEx should be like this:

"region"\s*:\s*"rg-(?P.*?)-

Note the named capturing group "regionName" here – this is what actually extracts text. We place it after the key name "region" and the characters "rg-": right where we expect it to be according to our format. The purpose of the \s* symbols is to make RegEx still work if there are any whitespaces before or after the : symbol.

Here is the result of applying this rule:

Message format result 3

 

Replacing and Removing Values

One of the most common examples where we need to replace or remove values is hiding personal data. Suppose you log phone numbers somewhere and don’t want those to be saved in Coralogix.

We should start with creating a new Rule, as described above. Once we select the Replace rule in the rule type selection section, we will see the following screen:

 

Replace Rule Editor

 

Suppose we have an unstructured log line like this:

"info: Sender: sendSms: sending sms to phone number +12345678910 to user Andrew"

You want to remove phone number and name from this line. This RegEx will match the line above, starting with "sending sms":

sending sms to phone number \+*\d+ to user .*

Note that we need to escape the + symbol with \ because it has a special meaning in the RegEx syntax. This meaning is “1 or more previous characters”. Symbol \d matches any single digit. Remember, that the * symbol means “0 or more previous characters”. Thus, \+*\d+ matches one or more digits which may be prepended by a + symbol (or not).

This RegEx will replace the text matched by the previous RegEx to the same text but without the phone number and name:

sending sms to phone number * to user *

Replace phone number and name

And here is the result of applying the above rule:

Replace phone number and name result

 

Replacing JSON Values in Structured Logs

Replacing JSON values in structured logs is similar. The whole JSON string is used as the input for the Replace rule.

Let’s remember our structured log line from above:

{
   "type": `${text}`,
   "log": `${text}`,
   "region": "rg-europe-2"
}

Suppose you need to replace the value of the "type" field with any other value you want. Here is how you can match the value of the "type" field:

"type"\s*:\s*".*?"

Remember that we need the \s* symbols to make sure the RegEx works if there are whitespaces before or after the : character.

And here is the RegEx to replace any value of the "type" field with "newType":

"type":"newType"

Replace Type

 

Using Backreferences

Suppose we need to replace “europe” to “eu” in the strings of format “west-europe-2”. And suppose we need to do this not only in the “region” field but in any other part of the log where we meet it. Matching this pattern is rather easy:

.+?-europe-\d+

But replacing it using the methods we used before might be rather hard. This is because we need to insert two strings before and after the “europe” text, which might vary. To do this, we first need to capture this text into capturing groups:

(.+?)-europe-(\d+) 

Remember that the \d symbol means “any digit” and together with + it means “any digit one or more times”. In a similar way, .+ means any symbol one or more times, but with the fewest amount of tokens to make the match.

This will capture the text before “europe” as capturing group 1, and the text after “europe” as capturing group 2. The following RegEx will use backreferences to insert matched content of those groups:

$1-eu-$2

Replace Europe Editor

Logs before applying our rule:

Logs before applying Replace Europe

Logs after applying our rule:

Logs after applying Replace Europe

 

 

Searching for Logs

Another place where you can use RegEx is the Logs search bar:

Logs search bar

You can enter the exact text you want to find there, without any RegEx. But if you look for a particular pattern, you need to use RegEx. RegEx queries have their own form, which is:

/${fieldName}.keyword:/REGEX//

Suppose we have many JSON-structured logs of the following form:

{    
    "log":  `${text}` ,
    "regionName":  `${text}`,
    "region":  `${text}`, 
    "type"  :  `ltest-w-${number}`
}

And we want to match only those entries where "type" is equal to “ltest-w-1”, “ltest-w-2” or “ltest-w-3”. The following search query will do this:

/type.keyword:/ltest-w-[1-3]//

Note: the text between square brackets [ and ], is called a character class. It matches any one character listed between square brackets. Dash may be used as a shorthand to list several characters: [1-5] is the same as [12345]

Search query example

You can use the same syntax to match data in unstrucutred logs with RegEx. You only need to set field name to text.keyword. For example:

 /text.keyword:/.*ltest-w-[1-3].*// 

Above RegEx will search for text “ltest-w-1”, “ltest-w-2” or “ltest-w-3” in main log body.

Similar RegEx syntax is used in the Kibana search bar. Click Kibana and then Discover to see this search input:

Kibana search bar

The only difference here is that you don’t need the / symbols in the beginning and at the end. Thus, our search query above will turn into:

type.keyword:/ltest-w-[1-3]/
 

Kibana search example

 

Note: sometimes you don’t see any logs after clicking Discover. It’s possible that there are no logs for the time interval set by default, which is 15 minutes. Try clicking the time interval selector and increasing it.

 

Triggering Alerts with RegEx

Another popular use case for RegEx in Coralogix is Alerts. Alerts syntax is the same as logs query syntax. Let’s start with creating an alert. First, click the Alerts tab, and then click the Plus button on the right:

 

Alerts Tab

 

Let’s say we want to alert us on a line of the following form:

`App: init: World-${name}: generation error: ${err}`

And suppose we want to get alerts only for worlds "w-1", "w-2", "w-3" or "w-4". Our alert RegEx will look like this:

/text.keyword:/.*World-w-[1-4]: generation error.*//

Remember, that [1-4] matches any single character from 1 to 4, and .* matches any characters any number of times.

In the “Rules definitions” section select “Notify immediately” to get alert on the first match. Or you can select More and then choose the number of matches needed to trigger the alert.

Alert Editor

In the Notifications definition section, enter email addresses where a message should be sent on alert.

Notifications Definition

When an alert is triggered, you will:

  • Get an email
  • Will see it in the dashboard
  • Will see it in the alerts section

Triggered alert in the dashboard

Triggered alert in the alerts section

 

Summary

I hope you found this tutorial interesting and helpful. You can start with Regular Expressions by copy-pasting the examples above and substituting parts of them when you need it. You will soon find that knowing RegEx makes your life easier not just with Coralogix, but with many other products.

Characters
Symbol(s)ExplanationExampleMatchNo Match
.Matches any single characterm.nmanmenmoonnon
\dMatches any single digituser_\d\d

user_12
user_30


user_123 (partial match)
user_A2

\DNot a digit. Matches any symbol except for digit.user_\D\d

user_A2
user_B0


user_22
user_2A"

\wWord character. Matches any letter, digit or underscore.user\w\w

user_3
user21
userP2

usr_21
user_324 (partial match)
\WNot a word character. Matches any character which is not a letter, digit or underscore.user\W\w

user:3
user-2
user:A


user_2
user32
userA2

\sMatches a whitespace character: space, tab, newline, carriage return{\s*"userName"\s*:\s*"user1"\s*}

{"userName":"user1"}
{ "userName": "user1"}
{ "userName": "user1" }
{
"userName": "user1"
}



{"userName":"user3"}

\SNot a whitespace character. Matches any character except for whitespace characters: space, tab, newline or carriage return"userName"\S"user1"

"userName":"user1"
"userName"A"user1"
"userName","user1"

"userName" "user1"
"userName": "user1"

Quantifiers
Symbol(s)ExplanationExampleMatchNo Match
*Zero or more matches of the previous character or RegEx sequenceuser_1*

user_
user_1111
user_1

user1
user_21 (partial match)
user_3 (partial match)
+One or more matches of the previous character or RegEx sequenceuser_1+

user_1
user_1111

user_
user_21
user_3
?One or zero matches of the previous character or RegEx sequence. Also see greedy vs. non-greedy quantifiersuser_1?

user_
user_1

user_2 (partial match)
user_11 (partial match)
{X}Exactly X matches of the previous characteruser_1{2}

user_11

user_
user_21
user_1
user_111 (partial match)
{X,Y}Between X and Y matches of the previous character (inclusive)user_1{2,3}

user_11
user_111

user_21
user_1
user_1111 (partial match)
{X,}X or more matches of the previous characteruser_1{2,}

user_11
user_111
user_1111

user_
user_21
user_1

Character Classes
Symbol(s)ExplanationExampleMatchNo Match
[abcd...]Matches any one character of those listed inside the square bracketsuser_[a-z]+

user_aikei
user_roma
user_hokfer

user_10
user_a23b (partial match)
[X-Y]Matches any one character in the range between X and Y.user_[2-4]

user_2
user_3
user_4

user_
user_1
user_5
[X-Yabcd...]You can specify multiple ranges and sets in the same character classuser_[2-4a-cgj]

user_2
user_a
user_b

user_
user_1
user_d
[^...]Matches any one character which is NOT in the class. The ^ symbol negates the character class.user_[^0-9]

user_A
user_BC
user_hello

user_1
user_12
user_500
[\xN]Matches a character at a specific unicode hexadecimal codepoint Nuser_[\x48](x48 is the character “H”)

user_H

user_a
user_1
user_2

Character Classes may be used with any Quantifiers:
Symbol(s)ExplanationExampleMatchNo Match
[...]+Matches one or more characters of those listed inside the square bracketsuser_[4-9]+

user_4
user_56
user_678

user_
user_1
user_40 (partial match)
[...]?Matches one or zero characters of those listed inside the square bracketsuser_[4-9]?

user_4
user_

user_3 (partial match)
user_44 (partial match)

Anchors and Boundaries
Symbol(s)ExplanationExampleMatchNo Match
^Matches start of a string by default or start of a line if multiline mode is on. Note that this symbol does not work when ^info: user.*logged in

info: user_20 logged in

_info: user_20 logged in
$Matches end of a string by default or end of a line if multiline mode is on. Note that this symbol does not work when searching for logs using Logs Query or Kibana^info: user.*logged in$

info: user_20 logged in

info: user_20 logged in now
\bWord boundary. Matches a point between two characters, one of which is a word character, and the other one is not a word characteruser_.*\bin

user_20 logged in

user_20 loggedin
user_20 logged_in

Groups and Backreferences
Symbol(s)ExplanationExampleMatchNo Match
(...)Matches any RegEx specified inside the parentheses and captures the match as a capturing group(user_)(.+?)\b user_28
The text “user_” will be captured as the capturing group 1 and the text “28” will be captured as the capturing group 2. Most useful for substitutions
user_
(...|...)Matches either RegEx on the left or on the right of the | symbol and captures the match as a capturing group.user_(11|22) user_11
user_22
“11” or “22” will be captured as the capturing group 1.
user_12
user_23
(?:...)Matches any RegEx specified inside the braces, but does not capture it as a capturing group.user_(?:11|22) user_11
user_22
Neither “11” nor “22” will be captured as a capturing group.
user_12
user_23
(?P<NAME>...)Captures a portion of a match as a capturing group with name NAME(user_)(?P.+?)\b user_28
The text “user_” will be captured as the capturing group 1 and the text “28” will be captured as the capturing group named “userId”. This is useful for extracting values from logs
user_
\NUMBERInserts contents of a matched group NUMBER into the search input. Note that you need to use $ instead of \ if you want to insert the group contents into the replace input instead. See next item and also “Replacing and Removing Values” above (user_.*?) logged in, \1 name is.*
user_15 logged in, user_15 name is John
user_15 logged in, user_14 logged in
$ NUMBERInserts contents of a matched group with number NUMBER into the replace input. Note that you need to use \ instead of $ if you want to insert the group contents into the replace input instead. See the previous item and also “Replacing and Removing Values” above
Search input:(user_.*?) name is .+?\b

Replace input:$1 name is *

Match:user_15 name is John
After replace:user_15 name is *

Greedy vs. Lazy Quantifiers
Symbol(s)ExplanationExampleMatchNo Match
*
+
{X,}
{X,Y}
All these quantifiers are “greedy” by default, meaning that they will match as many characters as possibleuser_.*\b user_15 logged in, user_16 logged out
Everything until the last word boundary is matched, which is at the end of the string
usr_15 logged in, usr_16 logged out
*?
+?
{X,}?
{X,Y}?P
The ? symbol after the quantifier makes it Lazy, meaning that it will match as less characters as possible.user_.*?\b user_15 logged in, user_16 logged out
Everything until the first word boundary is matched: the text in blue.
usr_15 logged in, usr_16 logged out

Inline Modifiers
Note that inline modifiers only work for Coralogix Rules, but is not supported when searching for logs using Logs Query or Kibana
Symbol(s)ExplanationExampleMatchNo Match
(?i)
Ignore case.(?i)user_.* user_15
User_15
USER_15
usr_15
USR_16
(?s)
Dotall mode. Makes the . symbol match newline characters.(?s)user_.*logged out Suppose you have a single log entry like this:

user_15 logged in
user_15 created a document new_document
user_15 logged out

Above regex will match all these three lines together as a single match.
The following regex

user_.*logged out

will not match anything in the example on the left
(?m)
Multiline mode. Makes the ^ and $ symbols match start and end of line respectively, instead of start and end of string.(?m)^user_.*$ Suppose you have a single log entry like this:

user_15 logged in
user_15 created a document new_document
user_15 logged out

Above regex will match any of these lines as a separate match
This RegEx will not match all of these lines together as a single match. But the following RegEx will:
^user_.*$

andrei

About the author
Andrei Chernikov is a Full-Stack Developer and also writes helpful guides on Medium.

Start solving your production issues faster

Let's talk about how Coralogix can help you better understand your logs

No credit card required

Get a personalized demo

Jump on a call with one of our experts and get a live personalized demonstration