How to get the most out of your ELB logs

What is ELB

Amazon ELB (Elastic Load Balancing) allows you to make your applications highly available by using health checks and intelligently distributing traffic across a number of instances. It distributes incoming application traffic across multiple targets, such as Amazon EC2 instances, containers, IP addresses, and Lambda functions. You might have heard the terms, CLB, ALB, and NLB. All of them are types of load balancers under the ELB umbrella. 

 

Types of Load Balancers

  • CLB: Classic Load Balancer is the previous generation of EC2 load balancer
  • ALB: Application Load Balancer is designed for web applications
  • NLB: Network Load Balancer operates at the network layer

This article will focus on ELB logs, you can get more in-depth information about ELB itself in this post

ELB Logs

Elastic Load Balancing provides access logs that capture detailed information about requests sent to your load balancer. Each ELB log contains information such as the time the request was received, the client’s IP address, latencies, request paths, and server responses.

 

ELB logs structure

Because of the evolution of ELB, documentation can be a bit confusing. Not surprisingly There are three variations of the access logs; ALB, NLB, and CLB. We need to rely on the document header to understand which of the variant logs it describes (the URL and body will usually reference ELB generically).

 

How to collect ELB logs

The ELB access logging capability is integrated with Coralogix. The logs can be easily collected and sent straight to the Coralogix log management solution. 

 

ALB Log Example

This is an example of a parsed ALB HTTP entry log:

{
 “type”:“http”,
 “timestamp:“2018-07-02T22:23:00.186641Z”, 
 “elb”:“app/my-loadbalancer/50dc6c495c0c9188”,
 “client_addr”:“192.168.131.39”,
 “client_port”:“2817”,
 “target_addr”:“110.8.13.9”,
 “target_port”:“80”,
 “request_processing_time”:“0.000”,
 “target_processing_time”:“0.001”,
 “response_processing_time”:“0.000”,
 “elb_status_code”:“200”,
 “target_status_code”:“200”,
 “received_bytes”:“34”,
 “sent_bytes”:“366”,
 “request”:“GET http://www.example.com:80/ HTTP/1.1”,
 “user_agent”:“curl/7.46.0”,
 “Ssl_cipher”:“-”,
 “ssl_protocol”:“-”,
 “target_group_arn”:“arn:aws:elasticloadbalancing:us-east-2:123456789012:targetgroup/my-targets/73e2d6bc24d8a067”,
 “trace_id”:“Root=1-58337262-36d228ad5d99923122bbe354”,
 “domain_name”:“type”:”http”-”,
 “chosen_cert_arn”:“-”,
 “matched_rule_priority”:”0”,
 “request_creation_time”:”2018-07-02T22:22:48.364000Z”,
 “Actions_executed“:“forward”,
 “redirect_url“:“-”,
 “error_reason“:“-”,
 “target_port_list“:”80”,
 “target_status_code_list“:“200”
}

Note that if you compare this log to the AWS log syntax table, we split the client address and port and target address and port into four different fields to make it easier.

 

CLB Log Example

This is an example of a parsed HTTPS CLB log:

{
 “timestamp":”2018-07-02T22:23:00.186641Z”, 
 “elb”:”app/my-loadbalancer/50dc6c495c0c9188”,
 “client_addr”:”192.168.131.39”,
 “client_port”:”2817”,
 “target_addr”:”10.0.0.1”,
 “target_port”:”80”,
 “request_processing_time”:”0.001”,
 “backend_processing_time”:”0.021”,
 “response_processing_time”:”0.003”,
 “elb_status_code”:”200”,
 “backend_status_code”:”200”,
 “received_bytes”:”0”,
 “sent_bytes”:”366”,
 “request":”http”GET http://www.example.com:80/ HTTP/1.1,
 “user_agent":”curl/7.46.0,
 “Ssl_cipher":”DHE-RSA_AES128-SHA”,
 “ssl_protocol”:”TLSv1.2”,
}

CLB logs have a subset of the ALB fields. The target is changed to the listener and the relevant target field names are changed to the backend. 

 

NLB Log Example

This is an example of an NLB log:

{
 “type”:”tls”,
 “version”:”1.0”,
 “timestamp":”2018-07-02T22:23:00.186641Z”, 
 “elb”:”net/my-network-loadbalancer/c6e77e28c25b2234”,
 “listener”:”g3d4b5e8bb8464cd”
 “client_addr”:”192.168.131.39”,
 “client_port”:”51341”,
 “target_addr:”10.0.0.1”,
 “target_port”:”443”,
 “connection_time”:”5”,
 “tls_handshake__time”:”2”,
 “received_bytes”:”29”,
 “sent_bytes”:”366”,
 “Incoming_tls_alert”:”-”
 “chosen_cert_arn”:”arn:aws:elasticloadbalancing:us-east-2:123456789012:certificate/2a108f19-aded-46b0-8493-c63eb1ef4a99”,
 “chosen_cert_serial”:”-”
 “tls_cipher":”ECDHE-RSA_AES128-SHA”,
 “tls_protocol_versiion”:”TLSv12”,
 “tls_named_group”:-,
 “domain_name:”my-network-loadbalancer-c6e77e28c25b2234.elb.us-east-2.amazonaws.com”,
}

ELB Log Parsing

ELB logs contain unstructured data. Using Coralogix parsing rules, you can easily transform the unstructured ELB logs into JSON format to get the full power of Coralogix and the Elastic stack working for you. Parsing rules use RegEx and I created the expressions for NLB, ALB-1, ALB-2, and CLB logs. The two ALB regexes cover “normal” ALB logs and the special cases of WAF, Lambda, or failed or partially fulfilled requests. In this cases AWS assigns the value ‘-‘ to the target_addr field with no port. You will see some time measurements assigned the value -1. Make sure you take it into account in your visualizations filters. Otherwise averages and other aggregations could be skewed. Amazon may add fields and change the log structure from time to time, so always check these against your own logs and make changes if needed. The examples should provide a solid foundation.  

The following section requires some familiarity with regular expressions but just skip directly to the examples if you prefer. 

A bit about why the regexes were created the way they were. Naturally, we always want to use a regex that is simple and efficient. At the same time, we should make sure that each rule captures the correct logs in the correct way (correct values matched with the correct fields). Think about a regex that starts with the following expression:

^(?P<timestamp>[^\s]+)\s*

It will work as long as the first field in the unstructured log is a timestamp, like in the case of CLB logs. However, in the case of NLB and ALB logs, the expression will capture the “type” field instead. Since the regex and rule have no context, it will just place the wrong value in the wrong JSON key. There are other differences that can cause problems like different numbers of fields or field order. To avoid this problem, we use the fact that NLB logs always start with ‘tls 1.0’ standing for the fields ’type’ and ‘version’, and that ALB logs start with a ‘type’ field with 6 optional values (http, https, h2, ws, wss).

Note: As explained in the Coralogix rule tutorial, rules are organized by groups and executed by the order they appear within the group. When a rule matches a log, the log will move to the next group without processing the remaining rules within the same group.

Taking all this into account, we should:

    • Create a rule group called ‘Parse ELB’
    • Put the ALB and NLB rules first (these are the rules that look for the specific beginning of the respective logs) in the group

 

This approach will guarantee that each rule matches with the correct log. Now we are ready for the main part of this post.

In the following examples, we’ll describe how different ELB log fields can be used to indicate operational status. In the examples, we assume that the logs were parsed into JSON format. The examples will rely on the Coralogix alerts engine and on Kibana visualizations. They also provide additional insights into the different keys and values within the logs. Like always, we give you ideas and guidance on how to get more value out of your logs. However, every business environment is different and you are encouraged to take these ideas and build on top of them based on the best implementation for your infrastructure and goals. Last but not least, Elastic Load Balancing logs requests on a best-effort basis. The logs should be used to understand the nature of the requests, not as a complete accounting of all requests. In some cases, we will use ‘notify immediately’ alerts, but you should use ELB as a backup and not as the main vehicle for these types of alerts.

Tip: To learn the details of how to create Coralogix alerts you can read this guide.

Alerts

Increase in ELB WAF errors

This alert identifies if a specific ELB generates 403 errors more than usual. A 403 error results from a request that is blocked by AWS WAF, Web Application Firewall. The alert uses the ‘more than usual’ option. With this option, Coraloix’s ML algorithms will identify normal behavior for every time period. It will trigger an alert if the number of errors is more than normal and is above the optional threshold supplied by the user.

Alert Filter:

elb:”app/my-loadbalancer/50dc6c495c0c9188” AND elb_status_code:”403”

Alert Condition: ‘More than usual’. The field elb_status_code can be found across ALB, CLB logs.

Outbound Traffic from a Restricted Address

In this example, we use the client field. It contains the IP of the requesting client. The alert will trigger if a request is coming from a restricted address. For the purpose of this example, we assume that permitted addresses are all under the subnet 172.xxx.xxx.xxx.

Alert Filter:

client_addr:/172\.[0-9]{1,3},[0-9]{1,3},[0-9]{1,3}/

Note: Client_addr is found across NLB, ALB, and CLB.

Alert Condition:‘Notify immediately’.

ELB Down

This alert identifies an inactive ELB. It uses the ‘less than’ alert condition. The threshold is set to no logs in 5 minutes. This should be adapted to your specific environment. 

Alert Filter:

elb:”app/my-loadbalancer/50dc6c495c0c9188”

Alert Condition: ‘less than 1 in 5 minutes’’ 

This alert works across NLB, ALB, and CLB.

Long Connection Time

Knowing the type of transactions running on a specific ELB, ops would like to be alerted if connection times are unusually long. Here again, the Coralogix ‘more than usual” alert option will be very handy. 

Alert Filter:

connection_time:[2 TO *]

Note: Connectiion_time is specific to NLB logs. You can create similar alerts on any of the time-related fields in any of the logs. 

Alert Condition:  ‘more than usual’

A Surge in ‘no rule’ Requests

The field ‘matched_rule_priority’ indicates the priority value of the rule that matched the request. The value 0 indicates that no rule was applied and the load balancer resorted to the default. Applying rules to requests is specifically important in highly regulated or secured environments. For such environments, it will be important to identify rule patterns and abnormal behavior. Coralogix has powerful ML algorithms focused on identifying deviation from a normal flow of logs. This alert will notify users if the number of requests not matched with a rule is more than the usual number.

Alert Filter:

matched_rule_priority:0

Note: This is an ALB field.

Alert Condition:  ‘more than usual’

No Authentication

In this example, we assume a regulated environment. One of the requirements is that for every ELB request the load balancer should validate the session, authenticate the user, and add the user information to the request header, as specified by the rule configuration. This sequence of actions will be indicated by having the value ‘authenticate’ in the actions_executed’ field. The field can include a few actions separated by ‘,’. Though ELB doesn’t guarantee that every request will be recorded, it is important to be notified of the existence of such a problem, so we will use the ‘notify immediately’ condition.  

Alert Filter:

NOT actions_executed:authenticate

Note: This is an ALB field.

Alert Condition:  ‘notify immediately’

Visualizations

Traffic Type Distribution

Using the ‘type’ field this visualization shows the distribution of the different requests and connection types.

Bytes Sent

Summation of the number of bytes sent.

Average Request/Response Processing Time

Request processing time is the total time elapsed from the time the load balancer received the request until the time it sent it to a target. Response processing time is the total time elapsed from the time the load balancer received the response header from the target until it started to send the response to the client. In this visualization, we are using Timelion to track the average over time and generate a trend line.

Timelion expression:

.es(q=*,metric=avg:destination.request_processing_time.numeric).label("Avg request processing time").lines(width=3).color(green), .es(q=*,metric=avg:destination.request_processing_time.numeric).trend().lines(width=3).color(green).label("Avg request processing time trend"),.es(q=*,metric=avg:destination.response_processing_time.numeric).label("Avg response processing Time").lines(width=3).color(red), .es(q=*,metric=avg:destination.response_processing_time.numeric).trend().lines(width=3).color(red).label("Avg response processing time trend")

Average Response Processing Time by ELB

In this visualization, we show the average response processing time by ELB. We used the horizontal option. See the definition screens.

Top Client Requesters for a specific ELB

This table lists the top IP addresses generating requests to specific ELB’s. The ELB’s are separated by the metadata applicationName. This metadata field is assigned to the load balancer when you configure the integration. We created a Kibana filter that looks only at these two devices. You can read about filtering and querying in our tutorial.

ELB Status Codes

This is an example showing the status code distribution for the last 24 hours.

 

You can also create a more dynamic representation showing how the distribution behaves over time.

This blog post covered the different types of services that AWS provides under the ELB umbrella, NLB, ALB, and CLB. We focused on the logs these services generate and their structure, and showed some examples of alerts and visualizations that can help you unlock the value of these logs. Remember that every user is unique and has its own use case and data. Your logs might be customized and configured differently and you will most likely have your own requirements. So, you are encouraged to take the methods and concepts showed here and adapt them to your own needs. If you need help or have any questions, don’t hesitate and reach out to support@coralogix.com. You can learn more about unlocking the value embedded in AWS and other logs in some of our other blog posts.

Start solving your production issues faster

Let's talk about how Coralogix can help you better understand your logs

Managed, Scaled and Compliant ELK Stack

No credit card required

Get a personalized demo

Jump on a call with one of our experts and get a live personalized demonstration