AWS re:Invent 2024: Discover the latest & greatest from Coralogix
As we gear up for AWS re:Invent this December, we’re excited to share some of the latest innovations that make our platform stand out. Coralogix continues…
You may be wondering, what are RegEx expressions? And how do they work when it comes to managing log data in Coralogix? We’ve got you covered.
Regular expressions can be crucial for wrangling log data efficiently. You may want to extract specific data from your logs to make it easier to analyze and visualize. Sometimes you might want to capture an email when a particular message is logged. Other times, you may need to hide sensitive data in logs before they are saved.
And more often than not, you need to match using a RegEx pattern rather than an exact text search.
This RegEx guide is split into three parts. We’ll start with a brief intro on understanding RegEx, while the second part shows RegEx methods and how you can use them in a logging context. Finally, we end with a complete RegEx 101 table for your quick reference
Regular Expressions, also known as RegEx or RegExp, is a domain-specific language (DSL) used for pattern searches and replacements.
There are times when you may not need a regular expression; you can just search for a specific text. For example, if you just want to find log lines with the text “user logged in”, you can simply enter this text into your log search tool. But if your log lines look like this: “user_32 logged in” you can’t search for the exact text anymore since the user id is a variable that changes.
Fortunately, there is a RegEx pattern for this case which makes use of the match anything sequence:
user_d+ logged in
In this section, we list all places where RegEx may be used in Coralogix and give you some ready-made examples which you can start using immediately.
One of the most wanted things is extracting data from logs. For example, this way, you can convert unstructured logs into structured logs. For this, you’ll need to use the Extract Rule.
Suppose you have an unstructured log of the following form:
`${logLevel}: World-${worldName}: ${logText}`
and you would like to convert all entries of this format into a JSON object of the following form:
{ "level": `${log-level}`, "tag": `World-${worldName}`, "text": `${logText}` }
The Extract Rule will extract all text matched by capturing groups and convert it into a JSON object. Names of the capturing groups will become JSON keys.
For example:
^(?P.*?):s*(?P.*?):
Some explanations:
^
is the start of line symbol, it means that match should start with the start of a line(?PX)
, where X is the matcher, is the named capturing group syntax. We have three capturing groups here, one for each JSON key we need.s
means “any whitespace character“.
means “any one character”*
means “0 or more matches of the previous symbol/character”.*
means “1 or more matches of the previous symbol/character”.*?
means “any characters any number of times, with the least amount of tokens necessary“Thus, the RegEx above means:
:
symbol as capturing group "level"
:
symbol as capturing group "tag"
The whole log text is automatically set to the "text"
field by Coralogix, so we will have the "text"
field automatically.
Above RegEx will convert the following log:
"info: World-w-8: generate: new world"
into this JSON object:
{ "level": "info", "tag": "World-w-8", "text": "info: World-w-8: generate: new world" }
But you don’t always want to extract text into custom JSON fields. Sometimes you need to assign matched text to predefined Coralogix fields. In the example above:
"info: World-w-8: generate: new world"
You might want to extract the "info"
text into the Severity column, and the "World"
text into the Class column.
You only need to set correct names of the capturing groups for that:
^(?P.*?):s*(?P.*?)-(?P.*?):
This RegEx will format the above message in the following way:
You can set other predefined columns like this, such as Category and Method. You need to set the keys "category"
and "methodName"
for this. Here is a RegEx mapping these predefined fields:
^(?P.*?):s*(?P.*?)-(?P.*?):s*(?P.*?)[$:]
And here is the result:
We already saw how you can extract data from unstructured logs into custom JSON fields. The same trick can be used for the structured logs too. Following is an example showing you how to extract data from a specific JSON filed.
Suppose you have a structured log line like this:
{ "type": `${text}`, "log": `${text}`, "region": "rg-europe-2" }
Now, knowing that the "region"
field has the following form:
`rg-${"europe"|"asia"|"na"}-${number}`
We want to extract the part which tells us whether the region is "europe"
, "asia"
or "na"
. The RegEx should be like this:
"region"s*:s*"rg-(?P.*?)-
Note the named capturing group "regionName"
here – this is what actually extracts text. We place it after the key name "region"
and the characters "rg-"
: right where we expect it to be according to our format. The purpose of the s*
symbols is to make RegEx still work if there are any whitespaces before or after the :
symbol.
Here is the result of applying this rule:
One of the most common examples where we need to replace or remove values is hiding personal data. Suppose you log phone numbers somewhere and don’t want those to be saved in Coralogix.
We should start with creating a new Rule, as described above. Once we select the Replace rule in the rule type selection section, we will see the following screen:
Suppose we have an unstructured log line like this:
"info: Sender: sendSms: sending sms to phone number +12345678910 to user Andrew"
You want to remove phone number and name from this line. This RegEx will match the line above, starting with "sending sms"
:
sending sms to phone number +*d+ to user .*
Note that we need to escape the +
symbol with because it has a special meaning in the RegEx syntax. This meaning is “1 or more previous characters”. Symbol
d
matches any single digit. Remember, that the *
symbol means “0 or more previous characters”. Thus, +*d+
matches one or more digits which may be prepended by a +
symbol (or not).
This RegEx will replace the text matched by the previous RegEx to the same text but without the phone number and name:
sending sms to phone number * to user *
And here is the result of applying the above rule:
Replacing JSON values in structured logs is similar. The whole JSON string is used as the input for the Replace rule.
Let’s remember our structured log line from above:
{ "type": `${text}`, "log": `${text}`, "region": "rg-europe-2" }
Suppose you need to replace the value of the "type"
field with any other value you want. Here is how you can match the value of the "type"
field:
"type"s*:s*".*?"
Remember that we need the s*
symbols to make sure the RegEx works if there are whitespaces before or after the :
character.
And here is the RegEx to replace any value of the "type"
field with "newType"
:
"type":"newType"
Suppose we need to replace “europe” to “eu” in the strings of format “west-europe-2”. And suppose we need to do this not only in the “region” field but in any other part of the log where we meet it. Matching this pattern is rather easy:
.+?-europe-d+
But replacing it using the methods we used before might be rather hard. This is because we need to insert two strings before and after the “europe” text, which might vary. To do this, we first need to capture this text into capturing groups:
(.+?)-europe-(d+)
Remember that the d
symbol means “any digit” and together with + it means “any digit one or more times”. In a similar way, .+
means any symbol one or more times, but with the fewest amount of tokens to make the match.
This will capture the text before “europe” as capturing group 1, and the text after “europe” as capturing group 2. The following RegEx will use backreferences to insert matched content of those groups:
$1-eu-$2
Logs before applying our rule:
Logs after applying our rule:
Another place where you can use RegEx is the Logs search bar:
You can enter the exact text you want to find there, without any RegEx. But if you look for a particular pattern, you need to use RegEx. RegEx queries have their own form, which is:
/${fieldName}.keyword:/REGEX//
Suppose we have many JSON-structured logs of the following form:
{ "log": `${text}` , "regionName": `${text}`, "region": `${text}`, "type" : `ltest-w-${number}` }
And we want to match only those entries where "type"
is equal to “ltest-w-1”, “ltest-w-2” or “ltest-w-3”. The following search query will do this:
/type.keyword:/ltest-w-[1-3]//
Note: the text between square brackets [
and ]
, is called a character class. It matches any one character listed between square brackets. Dash may be used as a shorthand to list several characters: [1-5]
is the same as [12345]
You can use the same syntax to match data in unstrucutred logs with RegEx. You only need to set field name to text.keyword
. For example:
/text.keyword:/.*ltest-w-[1-3].*//
Above RegEx will search for text “ltest-w-1”, “ltest-w-2” or “ltest-w-3” in main log body.
Similar RegEx syntax is used in the Kibana search bar. Click Kibana and then Discover to see this search input:
The only difference here is that you don’t need the /
symbols in the beginning and at the end. Thus, our search query above will turn into:
type.keyword:/ltest-w-[1-3]/
Note: sometimes you don’t see any logs after clicking Discover. It’s possible that there are no logs for the time interval set by default, which is 15 minutes. Try clicking the time interval selector and increasing it.
Another popular use case for RegEx in Coralogix is Alerts. Alerts syntax is the same as logs query syntax. Let’s start with creating an alert. First, click the Alerts tab, and then click the Plus button on the right:
Let’s say we want to alert us on a line of the following form:
`App: init: World-${name}: generation error: ${err}`
And suppose we want to get alerts only for worlds "w-1"
, "w-2"
, "w-3"
or "w-4"
. Our alert RegEx will look like this:
/text.keyword:/.*World-w-[1-4]: generation error.*//
Remember, that [1-4]
matches any single character from 1 to 4, and .*
matches any characters any number of times.
In the “Rules definitions” section select “Notify immediately” to get alert on the first match. Or you can select More and then choose the number of matches needed to trigger the alert.
In the Notifications definition section, enter email addresses where a message should be sent on alert.
When an alert is triggered, you will:
I hope you found this tutorial interesting and helpful. You can start with Regular Expressions by copy-pasting the examples above and substituting parts of them when you need it. You will soon find that knowing RegEx makes your life easier not just with Coralogix, but with many other products.
Characters | ||||
---|---|---|---|---|
Symbol(s) | Explanation | Example | Match | No Match |
. | Matches any single character | m.n | man men | moon non |
\d | Matches any single digit | user_\d\d | user_12 user_30 | user_123 (partial match) user_A2 |
\D | Not a digit. Matches any symbol except for digit. | user_\D\d | user_A2 user_B0 | user_22 user_2A" |
\w | Word character. Matches any letter, digit or underscore. | user\w\w | user_3 user21 userP2 | usr_21 user_324 (partial match) |
\W | Not a word character. Matches any character which is not a letter, digit or underscore. | user\W\w | user:3 user-2 user:A | user_2 user32 userA2 |
\s | Matches a whitespace character: space, tab, newline, carriage return | {\s*"userName"\s*:\s*"user1"\s*} | {"userName":"user1"} { "userName": "user1"} { "userName": "user1" } { |
|
\S | Not a whitespace character. Matches any character except for whitespace characters: space, tab, newline or carriage return | "userName"\S"user1" | "userName":"user1" "userName"A"user1" "userName","user1" | "userName" "user1" "userName": "user1" |
Quantifiers | ||||
---|---|---|---|---|
Symbol(s) | Explanation | Example | Match | No Match |
* | Zero or more matches of the previous character or RegEx sequence | user_1* | user_ user_1111 user_1 | user1 user_21 (partial match) user_3 (partial match) |
+ | One or more matches of the previous character or RegEx sequence | user_1+ | user_1 user_1111 | user_ user_21 user_3 |
? | One or zero matches of the previous character or RegEx sequence. Also see greedy vs. non-greedy quantifiers | user_1? | user_ user_1 | user_2 (partial match) user_11 (partial match) |
{X} | Exactly X matches of the previous character | user_1{2} | user_11 | user_ user_21 user_1 user_111 (partial match) |
{X,Y} | Between X and Y matches of the previous character (inclusive) | user_1{2,3} | user_11 user_111 | user_21 user_1 user_1111 (partial match) |
{X,} | X or more matches of the previous character | user_1{2,} | user_11 user_111 user_1111 | user_ user_21 user_1 |
Character Classes | ||||
---|---|---|---|---|
Symbol(s) | Explanation | Example | Match | No Match |
[abcd...] | Matches any one character of those listed inside the square brackets | user_[a-z]+ | user_aikei user_roma user_hokfer | user_10 user_a23b (partial match) |
[X-Y] | Matches any one character in the range between X and Y. | user_[2-4] | user_2 user_3 user_4 | user_ user_1 user_5 |
[X-Yabcd...] | You can specify multiple ranges and sets in the same character class | user_[2-4a-cgj] | user_2 user_a user_b | user_ user_1 user_d |
[^...] | Matches any one character which is NOT in the class. The ^ symbol negates the character class. | user_[^0-9] | user_A user_BC user_hello | user_1 user_12 user_500 |
[\xN] | Matches a character at a specific unicode hexadecimal codepoint N | user_[\x48] (x48 is the character “H”) | user_H | user_a user_1 user_2 |
Character Classes may be used with any Quantifiers: | ||||
---|---|---|---|---|
Symbol(s) | Explanation | Example | Match | No Match |
[...]+ | Matches one or more characters of those listed inside the square brackets | user_[4-9]+ | user_4 user_56 user_678 | user_ user_1 user_40 (partial match) |
[...]? | Matches one or zero characters of those listed inside the square brackets | user_[4-9]? | user_4 user_ | user_3 (partial match) user_44 (partial match) |
Anchors and Boundaries | ||||
---|---|---|---|---|
Symbol(s) | Explanation | Example | Match | No Match |
^ | Matches start of a string by default or start of a line if multiline mode is on. Note that this symbol does not work when | ^info: user.*logged in | info: user_20 logged in | _info: user_20 logged in |
$ | Matches end of a string by default or end of a line if multiline mode is on. Note that this symbol does not work when searching for logs using Logs Query or Kibana | ^info: user.*logged in$ | info: user_20 logged in | info: user_20 logged in now |
\b | Word boundary. Matches a point between two characters, one of which is a word character, and the other one is not a word character | user_.*\bin | user_20 logged in | user_20 loggedin user_20 logged_in |
Groups and Backreferences | ||||
---|---|---|---|---|
Symbol(s) | Explanation | Example | Match | No Match |
(...) | Matches any RegEx specified inside the parentheses and captures the match as a capturing group | (user_)(.+?)\b | user_28 The text “user_” will be captured as the capturing group 1 and the text “28” will be captured as the capturing group 2. Most useful for substitutions | user_ |
(...|...) | Matches either RegEx on the left or on the right of the | symbol and captures the match as a capturing group. | user_(11|22) | user_11 user_22 “11” or “22” will be captured as the capturing group 1. | user_12 user_23 |
(?:...) | Matches any RegEx specified inside the braces, but does not capture it as a capturing group. | user_(?:11|22) | user_11 user_22 Neither “11” nor “22” will be captured as a capturing group. | user_12 user_23 |
(?P<NAME>...) | Captures a portion of a match as a capturing group with name NAME | (user_)(?P.+?)\b | user_28 The text “user_” will be captured as the capturing group 1 and the text “28” will be captured as the capturing group named “userId”. This is useful for extracting values from logs | user_ |
\NUMBER | Inserts contents of a matched group NUMBER into the search input. Note that you need to use $ instead of \ if you want to insert the group contents into the replace input instead. See next item and also “Replacing and Removing Values” above | (user_.*?) logged in, \1 name is.* | user_15 logged in, user_15 name is John | user_15 logged in, user_14 logged in |
$ NUMBER | Inserts contents of a matched group with number NUMBER into the replace input. Note that you need to use \ instead of $ if you want to insert the group contents into the replace input instead. See the previous item and also “Replacing and Removing Values” above | Search input:(user_.*?) name is .+?\b Replace input: $1 name is * | Match:user_15 name is John After replace: user_15 name is * |
Greedy vs. Lazy Quantifiers | ||||
---|---|---|---|---|
Symbol(s) | Explanation | Example | Match | No Match |
* + {X,} {X,Y} | All these quantifiers are “greedy” by default, meaning that they will match as many characters as possible | user_.*\b | user_15 logged in, user_16 logged out Everything until the last word boundary is matched, which is at the end of the string | usr_15 logged in, usr_16 logged out |
*? +? {X,}? {X,Y}?P | The ? symbol after the quantifier makes it Lazy, meaning that it will match as less characters as possible. | user_.*?\b | user_15 logged in, user_16 logged out Everything until the first word boundary is matched: the text in blue. | usr_15 logged in, usr_16 logged out |
Inline Modifiers | ||||
---|---|---|---|---|
Note that inline modifiers only work for Coralogix Rules, but is not supported when searching for logs using Logs Query or Kibana | ||||
Symbol(s) | Explanation | Example | Match | No Match |
(?i) | Ignore case. | (?i)user_.* | user_15 User_15 USER_15 | usr_15 USR_16 |
(?s) | Dotall mode. Makes the . symbol match newline characters. | (?s)user_.*logged out | Suppose you have a single log entry like this:
Above regex will match all these three lines together as a single match. | The following regex
will not match anything in the example on the left |
(?m) | Multiline mode. Makes the ^ and $ symbols match start and end of line respectively, instead of start and end of string. | (?m)^user_.*$ | Suppose you have a single log entry like this:
Above regex will match any of these lines as a separate match | This RegEx will not match all of these lines together as a single match. But the following RegEx will:^user_.*$ |
About the author
Andrei Chernikov is a Full-Stack Developer and also writes helpful guides on Medium.
(This article was updated in August 2023)