How to Perform Log Analysis

Log file monitoring tools plays a central role in enhancing the observability of your IT estate, helping operations teams and SRE engineers to identify issues as they emerge and track down the cause of failures quickly. 

As the number of log entries generated on any given day in a medium-sized business easily numbers in the thousands, viewing and analyzing logs manually to realize these benefits is not a realistic option. This is where automated real-time log analysis comes in.

In this article, we’ll go through the steps involved in conducting log analysis effectively. To find out more about what log analysis can do for your organization, head over to our Introduction to Log Analysis resource guide.

Generating log files

The very first step to implementing log analysis is to enable logging so that log entries are actually generated, and to configure the appropriate logging level.

The logic that determines when a log entry may be generated forms part of the software itself, which means that unless you’re building the application or program in-house you generally can’t add new triggers for writing a log. 

However, you should be able to specify the logging level. This allows you to determine how much information is written to your log files.

While both the number and names of log levels can vary between systems, most will include:

  • ERROR – for problems that prevent the software from functioning. This could be a serious error that causes the system to crash, or a workflow not completing successfully.
  • WARNING (or WARN) – for unexpected behavior that does not prevent the program from functioning, but may do so in the future if the cause of the warning is not addressed. Examples include disk space reaching capacity or a query holding database locks.
  • INFORMATION (or INFO) – for normal behavior, such as recording user logins or access to files.
  • DEBUG – for more detailed information about what is happening in the background, useful when troubleshooting an issue, both in development and in production.

When you enable logging on a system, you can also specify the minimum logging level. For example, if you set the level to WARNING, any warning and error level logs will be output by the system, but information and debug logs will not. You may also come across TRACE, which is lower than DEBUG, and SEVERE, CRITICAL or FATAL, which are all higher than ERROR.

Collecting logs

By using log file monitoring tools like Filebeat, you can centralize your logs into a single, queryable place. These tools will listen to changes to your local log files and push them into a central location. This is commonly an Elasticsearch cluster, but there are many options out there. When your logs are in the same place, you can go to a single site to get the bigger picture. This limits the toil of jumping between servers.

But now you’ve got to look after your logging platform

Elasticsearch is notoriously tricky to maintain. It has many different configuration options, and that’s before you look to optimize the cluster. Node outages can cause the loss of critical operational data, and the engineering effort, combined with the hosting costs, can quickly become expensive. At Coralogix, we aim to make this simple for you. We have experts with the Elasticsearch toolset who can ensure a smooth experience with no operational overhead. 

Normalizing and parsing your logging data

The great challenge with your logs is to make them consistent. Logs are a naturally unstructured format, so parsing them can become a complex task. One strategy that teams employ is to always log in the same format, for example, JSON. Logs in JSON format are simple to parse and consistent. You can also add custom fields into your logs to surface application or business-specific information.

But what about 3rd party log analysis?

Our systems are increasingly made up of homegrown and external services, and our observability platform needs to be able to view everything, in order to enable us to perform log analysis. So what do we do about 3rd party logs? The challenge is that we can’t reliably mutate 3rd party logs, since they may change beyond our control, but what if we can add to them?

Log enrichment is key to full log analysis

It’s difficult to parse, mutate, and normalize all of your 3rd party logs, but enrichment is a great way to create some basic fields to enable log analysis. In addition, if you’re debugging an issue, the addition of tracing data to your logs can help you link together multiple events into the same logical group. This allows you to connect your logs to your business more closely. Now your logs are in great shape, it’s time to really unlock the power of log analysis.

Visualizing log data

Data visualizations are a powerful tool for identifying trends and spotting anomalies. By collating your logs in a central location, you can plot data from multiple sources to run cross-analyses and identify correlations.

Your log analytics platform should provide you with the option to run queries and apply filters to dashboards so that you can interrogate your data. For example, by plotting log data over time, you can understand what normal currently looks like in your system or correlate that data with known events such as downtime or releases. 

Adding tags for these events will also make it easier to interpret the data in the future. Log analytics tools that allow you to drill down from the dashboard to specific data points significantly speed up the process of investigating anything unusual so that you can quickly determine whether it’s a sign of a real problem.

Using graphical representations of your log data can help you spot emerging trends, which is useful for capacity and resource planning. By staying ahead of the curve and anticipating spikes in demand, you can provision additional infrastructure or optimize particular workflows in order to maintain a good user experience and stay within your SLAs.

Actionable insights from your log analysis

This is where things become interesting. Now you’ve got the data and the graphs, you can process data in new and interesting ways. This is where the benefits of a mature, centralized logging platform become key. What can you do with a centralized logging platform?

Machine learning log analysis to detect unknown issues

Machine learning log analysis is very difficult to master, but it can work wonders once you have a working ML platform in place. The problem is the upfront effort and cost. It requires a great deal of analysis and expertise to get an operating ML model in place. A mature logging analysis platform with this functionality in place can help you get straight to the benefit without messing around. 

Setting up alerts when your log analysis reveals something scary

Sometimes, your logs will indicate that there is a severe problem. You don’t want to wait until you glance at the monitoring board. Observability is all about giving your system a voice. By using a central log analysis platform, you can alert on complex occurrences between many applications to provide specific, tangible alerts that teams can act on quickly. 

Conclusion

Log data analysis can provide you with a wealth of insights into the usage, health, and security of your systems, together with powerful and efficient tools for detecting and troubleshooting issues. Key to this endeavor is a log analytics platform that can not only simplify and accelerate the process of collating, normalizing, and parsing your log data to make it available for analysis, but also identify patterns and detect potential anomalies automatically.

By choosing a log analytics tool that leverages machine learning to keep pace with your systems as they evolve, you’ll ensure that you get maximum value from your logs while freeing up your operations and SRE teams to focus on investigating true positives or making targeted improvements to your platform and infrastructure.

Coralogix provides integrations for a wide range of log sources, including Windows Event Viewer, AWS S3, ECS and Lambda, Kubernetes, Akamai, and Heroku, support for popular log shipping agents such as Fluentd, Logstash, and Filebeat, as well as SDKs for Python, Ruby, Java, and others. Parsing rules enable you to normalize and structure your log data automatically on ingest, ready for filtering, sorting, and visualizing.

Coralogix includes multiple dashboards for visualizing, filtering, and querying log data, together with support for Kibana, Tableau, and Grafana. Our Loggregation feature uses machine learning to cluster logs based on patterns automatically, while flow and error volume anomaly alerts notify you of emerging issues while minimizing noise from false positives.

To find out more about how Coralogix can enhance the observability of your systems with log analytics, sign up for a free trial or request a demo.

An Introduction to Log Analysis

If you think log files are only necessary for satisfying audit and compliance requirements, or to help software engineers debug issues during development, you’re certainly not alone. With proactive log monitoring, you configure thresholds for key health metrics and trigger alerts when these are exceeded. 

Although log files may not sound like the most engaging or valuable assets, for many organizations, they are an untapped reservoir of insights that can offer significant benefits to your business.

With the proper analysis tools and techniques, your log data can help you prevent failures in your systems, reduce resolution times, improve security, and deliver a better user experience.

Understanding log files

Before we look at the benefits that log analysis can offer you, let’s take a moment to understand what logs actually are. Logs – or log entries – are messages that are generated automatically while the software is running. 

That software could be an application, operating system, firewall, networking logic, or embedded program running on an IoT device, to name just a few. Logs are generated from every level of the software stack.

Each entry (or log line) provides a record of what was happening or the state of the system at a given moment in time. They can be triggered by a wide range of events, from everyday routine behavior, such as users logging in to workstations or requests made to servers, to error conditions and unexpected failures. 

The precise format and content of a log entry varies, but will typically include a timestamp, log severity level, and message. Each log line is written to a log file and stored – sometimes for a few days or weeks (if the data is not required for regulatory reasons) and sometimes for months or even years.

Benefits of log analysis

Log analysis is the process of collating and normalizing log data to be parsed and processed for easier querying and providing visualizations of that data to identify patterns and anomalies.

Analyzing the data recorded in the log files from across your organization’s systems and applications will help you improve your offer’s services, enhance your security posture, and give you a better understanding of how your systems are used.

Troubleshooting failures

The primary use of log files is to provide visibility into how your software is behaving so that you can track down the cause of a problem. As computing trends towards more distributed systems, with applications made up of multiple services running on separate but connected machines, investigating the source of an issue has become more complex.

Collating and analyzing logs from the various components in a system makes it possible to join the dots and make sense of the events that led up to an error or failure. Automated log analysis speeds up this process by identifying patterns and anomalies to help you fix issues faster. Log data analysis can also be used to identify early warning signs that can alert you to similar problems earlier in the future.

Proactive monitoring and observability

The benefits of automated log analysis go further than troubleshooting issues that have already occurred. By analyzing log data in real-time, you can spot emerging issues before any real damage is done.

Observability solutions take these techniques a step further, using machine learning to maintain a constantly evolving picture of normal operations, with alerts triggered whenever anomalous behavior patterns are detected.

Taking a proactive approach to anomaly detection and troubleshooting can significantly reduce the number of serious and critical failures that occur in your production systems and reduce mean time to resolution (MTTR) for issues that arise. The result is a better experience for your users and fewer interruptions to business activities.

Security forensics

Observability and monitoring play an essential role in detecting early signs of an attack and containing threats. If a malicious actor does breach your defenses, log files often provide clues regarding how the attack was executed and the extent of the damage perpetrated or the data leaked.

Log data analysis expedites this process by drawing connections between activities, such as user account activity taking place out of hours coupled with unusual data access patterns or privilege escalation. 

As well as providing the data required for reporting and audit compliance, this knowledge of how an attack was executed is essential for strengthening your defenses against similar threats in the future.

System design

As users’ expectations of software systems continue to rise, maintaining high-performance levels, stability and uptime are essential. Analyzing log data from across your IT estate can help you build a fuller picture of how your systems are used, providing you with data to inform your decisions to make targeted enhancements.

By tracking resource usage over time, you can be proactive about provisioning additional infrastructure to increase capacity or decommissioning it to save costs. Identifying slow-running database queries so that you can optimize them improves not only page load time but also reduces the risk of locks or resource saturation slowing down the rest of your system.

Using log data to understand how users interact with your application or website can also provide valuable insights into user behavior, including popular features, common requests, referring sites, and conversion rates. This information is invaluable when deciding where to next invest your development efforts.

Wrapping up

Log file analysis enables you to leverage the full benefits of your log data, transforming log files from a business cost required for regulatory reasons to a business asset that helps you streamline your operations and improve your services.

Elasticsearch Audit Logs and Analysis

Security is a top-of-mind topic for software companies, especially those that have experienced security breaches. This article will discuss how to set up Elasticsearch audit logging and explain what continuous auditing logs track.

Alternatively, platforms can use other tools like the cloud security platform offered by Coralogix instead of internal audit logging to detect the same events with much less effort. 

Companies must secure data to avoid nefarious attacks and meet standards such as HIPAA and GDPR. Audit logs record the actions of all agents against your Elasticsearch resources. Companies can use audit logs to track activity throughout their platform to ensure usage is valid and log when events are blocked. 

Elasticsearch can log security-related events for accounts with paid subscriptions. Elasticsearch audit provides logging of events like authentications and data-access events, which are critical to understanding who is accessing your clusters, and at what times. You can use machine learning tools such as the log analytics tool from Coralogix to analyze audit logs and detect attacks.

Turning on Audit Logging in Elasticsearch

Audit General Settings

Audit logs are off by default in your Elasticsearch node. They are turned on by configuring the static security flag in your elasticsearch.yml (or equivalent .yml file). Elasticsearch requires this setting for every node in your cluster. 

xpack.security.audit.enabled=true

Enabling audit logs is currently the only static setting needed. Static settings are only applied, or re-applied, to unstarted or shut down nodes. To turn on Elasticsearch audit logs, you will need to restart any existing nodes.

Audit Event Settings

You can decide what events are logged on each Elasticsearch node in your cluster. Using the events.include or events.exclude settings, you can decide which security events Elasticsearch logs into its’ audit file. Using _all as your include setting will track everything. The exclude setting can be convenient when you want to log all audit event types except one or two.

xpack.security.audit.logfile.events.include=[_all]
xpack.security.audit.logfile.events.exclude=[run_as_granted]

You can also decide if the request body that triggered the audit log is included in the audit event log. By default, this data is not available in audit logs. If you need to audit search queries, use this setting, so the queries are available for analysis.

xpack.security.audit.logfile.events.emit_request_body=true

Audit Event Ignore Policies

Ignore policies allow you to search for audit events that you do not want to print. Use the policy_name value to link configurations together and form a policy with multiple settings. Elasticsearch does not print events that match all conditions in a policy.

Each of the ignore filters uses a list of values or wildcards. Values are known data for the given type.

xpack.security.audit.logfile.events.ignore_filters.<policy_name>.users=[*]
xpack.security.audit.logfile.events.ignore_filters.<policy_name>.realms=[*]
xpack.security.audit.logfile.events.ignore_filters.<policy_name>.actions=[*]
xpack.security.audit.logfile.events.ignore_filters.<policy_name>.roles=[*]
xpack.security.audit.logfile.events.ignore_filters.<policy_name>.indices=[*]

Node Information Inclusion in Audit Logs

Information about the node can be included in each audit log event. Each of the following settings is used to turn on one of the pieces of information that are available. By default, all are excluded except the node id value. Optional node data includes the node name, the node IP address, the node’s host name, and the node id.

xpack.security.logfile.emit_node_name=true
xpack.security.logfile.emit_node_host_address=true
xpack.security.logfile.emit_node_host_name=true
xpack.security.logfile.emit_node_id=true

Information Available in Elasticsearch Audit

Elasticsearch audit events are logged into a single JSON file. Each audit event is printed on a single line with no end-of-line delimiter. The format of the file is similar to a CSV in that it was meant to have columns. There are fields within it that follow JSON formatting with an ordered dot notation syntax containing any non-null string. The purpose was to make the file more easily readable by people as opposed to machines. 

An example of an Elasticsearch audit log is below. In it, there are several fields that are needed for analysis. For a complete list of the audit logs available, see the Elasticsearch documentation.

{"type":"audit", "timestamp":"2021-06-23T07:51:31,526+0700", "node.id":
"1TAMuhilWUVv_hBf2H7yXW", "event.type":"ip_filter", "event.action":
"connection_granted", "origin.type":"rest", "origin.address":"::3",
"transport.profile":".http", "rule":"allow ::1,127.0.0.1"}

The event.type attribute shows the internal layer that generated the audit event. This may be rest, transport, ip_filter, or security_config_change. The event.action attribute shows what kind of event occurred. The actions available depend on the event.type value, with security_config_change types having a different list of available actions than the others. 

The origin.address attribute shows the IP address at the source of the request. This IP address may be of the remote client, the address of another cluster, or the local node. In cases where the remote client connects to the cluster directly, you will see the remote IP address here. Otherwise, the address is listed with the first OSI layer 3 proxy in front of the cluster. The origin.type attribute shows the type of request made originally. This could be rest, transport, or local_node.

Where Elasticsearch Stores Audit Logs

A single log file is created for each node in your Elasticsearch cluster. Audit log files are written only to a local filesystem to keep the file secure and ensure durability. The default filename is <clustername>_audit.json

You can configure Filebeat in the ELK stack to collect events from the JSON file and forward them to other locations, such as back to an Elasticsearch index or into Logstash. Filebeat replaced the older model of Elasticsearch, where audit logs were sent directly to an index without queuing. This model caused logs to be dropped if the index rate of the audit log index was lower than the rate of incoming logs. 

This index ideally will be on a different node and cluster than where the logs were generated. Once the data is in Elasticsearch, it can be viewed on a Kibana audit logs dashboard or sent to another source such as the Coralogix full-stack observability tool, which can ingest data from Logstash. 

Configuring Filebeat to Write Audit Logs to Elasticsearch

After the Elasticsearch audit log settings are configured, you can configure the Filebeat settings to read those logs.

Here’s what you can do:

  1. Install Filebeat
  2. Enable the Elasticsearch module, which will ingest and parse the audit events
  3. Optionally customize the audit log paths in the elasticseach.yml file within the modules.d folder. This is necessary if you have customized the name or path of the audit log file and will allow Filebeat to find the logs.
  4. Specify the Elasticsearch cluster to index your audit logs. Add the configuration to the output.elasticsearch section of the filebeat.yml file
  5. Start Filebeat

Analysis of Elasticsearch Audit Logs

Elasticsearch audit logs hold information about who or what is accessing your Elasticsearch resources. This information is required for compliance through many government information standards such as HIPAA. In order for the data to be useful in a scalable way, analysis and visualization are also needed.

The audit logs include events such as authorization successes and failures, connection requests, and data access events. They can also include search query analysis when the emit_request_body setting is turned on. Using this data, professionals can monitor the Elasticsearch cluster for nefarious activity and prevent data breaches or reconstruct events. The completeness of the event type list means that with the analysis you can follow any given entity’s usage on your cluster.

If automatic streaming is available from Logstash or Elasticsearch, audit logs can be sent to other tools for analysis. Automatic detection of suspicious activity could allow companies to stop data breaches. Tools such as Coralogix’s log analysis can provide notifications for these events.

How does Coralogix fit in?

With Coralogix, you can send logs with our log analytics tool. This tool uses machine learning to find where security breaches are occurring in your system. You can also set up the tool to send notifications when suspicious activity is detected. 

In addition, the Coralogix security platform allows users to bypass the manual setup of Elasticsearch audit logging by detecting the same access events. This platform is a Security as Code tool that can be linked directly to your Elasticsearch cluster and will automatically monitor and analyze traffic for threats.

Summary

Elasticsearch audit logs require a paid Elasticsearch subscription and manual setup. The logs will track all requests made against your Elasticsearch node and log them into a single, locally stored JSON file. Your configuration determines what is and is not logged into the audit file. 

Your locally-stored audit file was formatted with the intention of being human-readable. However, reading this file is not a scalable or recommended security measure. You can stream audit logs to other tools by setting up Filebeat.