Elasticsearch Release: Roundup of Changes in 7.13.3

Elastic made their latest minor Elasticsearch release on May 25, 2021. Elasticsearch Version 7.13 contains the rollout of several features that were only in preview in earlier versions. There are also enhancements to existing features, critical bug fixes, and some breaking changes of note.

Three more patches have been released on the minor version, and more are expected before releasing the next minor version. 

A quick note before we dive into the new features and updates: The wildcard function in Event Query Language (EQL) has been deprecated. Elastic recommends using like or regex keywords instead.

Users can find a complete list of release notes on the Elastic website.

New Features

Combined Fields search

The combined_fields query is a new addition to the search API. This query supports searching multiple text fields as though their contents were indexed in a single, combined field. The query automatically analyzes your query as individual terms and then looks for each term in any of the requested fields. This feature is useful when users are searching for text that could be in many different fields.

Frozen Tier

Elastic defines several data tiers. Each tier is a collection of nodes with the same role and typically the same hardware profile. The new frozen tier includes nodes that hold time-series data that are rarely accessed and never updated. These are kept in searchable snapshots. Indexed content generally starts in the content or hot tiers, then can cycle through warm, cold, and frozen tiers as the frequency of use is reduced over time.

The frozen tier uses partially mounted indices to store and load data from a snapshot. Storage and operating costs are reduced by this storage method but still allows you to search the data, albeit with a slower response. Elastic improves the search experience by retrieving minimal data pieces necessary for a query. For more information about the Frozen tier and how to query it, see this Elastic blog post.

IPv4 and IPv6 Address Matching

Painless expressions can match IPv4 and IPv6 addresses automatically against Classless Inter-Domain Routing (CIDR) ranges. When a range is defined, you can use the painless contains script to determine if the input IP address falls within the range. This is very useful for grouping and classifying IP addresses when using Elastic for security and monitoring.

Index Runtime Fields

Runtime fields are fields in your document that are formed based on the source context. These fields can be defined by the search query or by the index mapping itself. Defining the field in the index mapping will give better performance. 

Runtime fields are helpful when you need to search based on some calculated value. For example, if you have internal error codes, you may want to return specific text related to the code. Without storing the associated text, a runtime field can be used to translate a numerical code to an associated text string with a runtime field.

Further, the runtime can be updated by reindexing the document with a newly formed runtime field. This makes updating much more straightforward than having to update each document with new text.

Aliases for Trained Models

Aliases for Elasticsearch indices have been present since version 1. They are a convenient way to allow functions to point to different data sets independent of the index name. For example, you may wish to have versions of your index. Using an alias, you can always fetch whatever version of the data has a logical value by assigning it an alias.

Aliases are now also available for trained models. Trained models are machine learning algorithms that have been run against a sample set. The existing, known data have trained the output algorithm to give some output. This algorithm can then be applied to new, unknown data, theoretically classifying it in the same way expected for known data. Standard algorithms may include classification analysis or regression analysis. 

Elastic now allows you to apply an alias to your trained models like you could already do for indices. The new model_alias API, allows users to insert and update aliases on your trained models. This alias can make it easier to apply specific algorithms for data sets by allowing users to logically alias the machine learning algorithms.

Fields added to EQL search

Event Query Language (EQL) is a language explicitly used for searching event time-based data. Typical uses include log analytics, time-series data processing, and threat detection. 

In Elasticsearch 7.13.0, developers added the fields parameter as an alternative to the _source parameter. The fields option extracts values from the index mapping while _source accesses the original data sent at index time. The fields option is recommended by Elastic because it:

  • returns values in a standardized way according to its mapping type, 
  • accepts both multi-fields and field aliases, 
  • formats dates and spatial types according to inputs, 
  • returns runtime field values, and 
  • can also return fields calculated by a script at index time.

Log analytics on Elastic can be tricky to set up. Third-party tools, like Coralogix’s log analytics platform, exist to help you analyze data without any complex setup. 

Audit Events Ignore Policies

Elasticsearch can log security-related events if you have a paid subscription account. Audit events provide logging of different authentication and data access events that occur against your data. The logs can be used for incident responses and demonstrating regulatory compliance. With all the events available, the logs can bog down performance due to the volume of logs and amount of data. 

In Elastic Version 7.13, Elastic introduced audit events ignore policies, so users can choose to suppress logging for certain audit events. Setting the ignore policies involves creating rules with match audit events to ignore and not print.

Enhancements

Performance: Improved Speed of Terms Aggregation

The terms aggregation speed has been improved under certain circumstances. These are common to time series and particularly when the data is in cold or frozen storage tiers. The following are cases where Elastic has improved aggregation speed:

  • The data has no parent or child aggregations
  • The indices have no deleted documents
  • There is no document-level security
  • There is no top-level query
  • The field has global ordinals (like keyword or ip field)
  • There are less than a thousand distinct terms.

Security: Prevention of Denial of Service Attack

The Elasticsearch Grok parser contained a vulnerability that nefarious users could exploit to produce a denial of service attack. Users with arbitrary query permissions could create Grok queries that would crash your Elasticsearch node. This security flaw is present in all Elasticsearch versions before 7.13.3. 

Bug Fixes

Default Analyzer Overwrites Index Analyzer

Elasticsearch uses analyzers to determine when a document matches search criteria. Analyzers are used to search for text fields in your index. In version 7.12, a bug was introduced where Elasitcsearch would use the default analyzer (a standard analyzer) on all searches.

According to documentation, the analyzer configured in the index mapping should be used, with the default only being used if none was configured. In version 7.13, this bug was fixed, so the search is configured to use the index analyzer preferentially.

Epoch Date Timezone Formatting with Composite Aggregations

Composite aggregations are used to compile data into buckets from multiple sources. Typical uses of this analysis would be to create graphs from a compilation of data. Graphs may also include time as a method to collect data into the same set. If the user required a timezone to be applied, Elasticsearch behaved incorrectly when stored times were Epoch.

Epoch datetimes are always listed in UTC. Applying a timezone requires formatting the date which was not previously applied internally in Elasticsearch. This bug was resolved in version 7.13.3

Fix Literal Projection with Conditions in SQL

SQL queries can use literal selections in combination with filters to select data. For example, the following statement uses a literal selection genre and a filter record:

SELECT genre FROM music WHERE format = ‘record’

Elasticsearch was optimizing to use a local relation in error. This error caused only a single record to be returned even if multiple records match the filter. Version 7.13.3 fixed this issue which was first reported in November 2020. 

Summary

Elastic pushed up many new features, bug fixes, and enhancements in version 7.13 and has continued to apply small changes through version 7.13.3. The significant new features of note support a frozen storage tier, including ignoring policies for audit events and index runtime fields.

For a complete list of new features, see the Elastic release notes.

Announcing Streama: Get complete monitoring coverage without paying for the noise

With the new Streama capability announced today, you no longer have to choose what to monitor and what to drop to manage your logging cloud costs

For years, our customers have enjoyed the benefits of a log analytics platform that enables them to autonomously manage and analyze data in their cloud applications. Our machine-learning engine empowers users to improve their system stability and accelerate their release cycles. Thousands of global-leading companies including Masterclass, Monday.com, BookMyShow, Postman, and PayU use Coralogix to power their businesses.

With the release of our new Streama real-time analytics solution, we’re excited to challenge the existing cost model of observability by allowing customers to pay according to data priority instead of solely on volume. 

Pay by priority, not by volume

Most data (as much as 99.5%, according to Cisco) is never queried or analyzed. That includes our log data which often contributes to high data storage costs without ever being used. At the same time, some logs do contain important information. It just doesn’t make sense for you to pay the same price for critical log data that you pay for “irrelevant” log data.

By re-engineering the Elasticsearch engine, Coralogix is now able to offer users the ability to prioritize their logs and define how the data is routed and stored according to function and importance. No longer do you need to pay premium storage fees for logs that you only need for compliance or monitoring purposes.

Disguised as a simple triaging capability, Coralogix’s new Streama feature drastically reduces logging costs while simultaneously improving your ability to query, monitor, and manage your data. Thousands of Coralogix customers, from fast-growing start-ups to large enterprises, are already seeing cost savings of up to 70 percent.

Streama Priority Levels, How It Works

Perhaps the easiest way to understand the Stream priorities is to look at the current state. All of the log data is being indexed and stored in the same way, though they serve many different purposes. In this model, whether the logs are used for troubleshooting or monitoring or simply for compliance reasons, storage costs are the same.

With the new Streama capability, logs can be classified as high, medium, and low priority corresponding to the functionality of the data they contain.

High Priority logs are the most important logs, typically high severity or business-critical data that is stored on highly available SSDs, replicated, and ready to be queried within seconds. This is currently the default for traditional log management solutions. When everything is treated as a high priority, it’s not surprising that the costs are extreme.

Medium Priority logs are the real game-changer here. These logs are important for monitoring system metrics and statistics but aren’t needed in their entirety. Coralogix uses special Loggregation© technology to identify the logging template without needing to index the log itself. This gives users the ability to define alerts, build dashboards, view statistics, query the live data stream, and proactively identify anomalies while saving up to 80% on storage costs. 

Low priority logs are non-important log data that needs to be kept for compliance or post-processing reasons. This data will go straight to your archive. This capability is available in some alternative solutions as a rudimentary ‘blocking’ feature. Storage cost savings for low priority logs are up to 95%.

By prioritizing log data based on functionality, companies are able to save massive amounts of money without losing monitoring coverage. Plus, you can always move your data from one level to another, even retroactively, so no regrets.

Some additional capabilities that are built-in around this new logging model include:

  • Direct S3 query of archived logs & Reindex on a highly granular section to minimize the use of your daily quota
  • Create new metrics indices that can be monitored on up to 12-month cohorts, compared to tracking metrics only during the retention period.
  • Continue to get ML-powered anomalies even for unstored logs

This new model enables you to get all of the benefits of an ML-powered logging solution at only a third of the cost and with more real-time analysis and alerting capabilities.

What This Means For Observability

These cost savings enable additional data routing and wider visibility into the system behavior without requiring any additional budget. “Over the last few years, companies have had to forgo observability due to prohibitive costs,” says Ariel Assaraf, CEO, and co-founder of Coralogix. 

“This month, we started with 3 very big international clients spending half a million dollars a year for our service, and we reduced that to less than $200,000. So, we created massive savings, and that allows them to scale. Because they already have that budget, they can now stop thinking about whether or not to connect new data. They just pour in a lot more data and they get better observability.”

Unlike monitoring, which can only really provide insights surrounding ‘known’ issues, observability is our ability to ask and answer questions about the unknown unknowns in our systems. In order to improve observability, we need access to more data along with ML-capabilities which help identify signals in the noise. That’s exactly what Coralogix’s new Streama capability is all about. 

Start saving up to 70% on logging costs today!

To see how the new Streama feature works, start a free trial. If you’re a current user, be sure to check out the new feature and start cutting costs and increasing observability today.