Elastic made their latest minor Elasticsearch release on May 25, 2021. Elasticsearch Version 7.13 contains the rollout of several features that were only in preview in earlier versions. There are also enhancements to existing features, critical bug fixes, and some breaking changes of note.
Three more patches have been released on the minor version, and more are expected before releasing the next minor version.
A quick note before we dive into the new features and updates: The wildcard function in Event Query Language (EQL) has been deprecated. Elastic recommends using like or regex keywords instead.
Users can find a complete list of release notes on the Elastic website.
The combined_fields query is a new addition to the search API. This query supports searching multiple text fields as though their contents were indexed in a single, combined field. The query automatically analyzes your query as individual terms and then looks for each term in any of the requested fields. This feature is useful when users are searching for text that could be in many different fields.
Elastic defines several data tiers. Each tier is a collection of nodes with the same role and typically the same hardware profile. The new frozen tier includes nodes that hold time-series data that are rarely accessed and never updated. These are kept in searchable snapshots. Indexed content generally starts in the content or hot tiers, then can cycle through warm, cold, and frozen tiers as the frequency of use is reduced over time.
The frozen tier uses partially mounted indices to store and load data from a snapshot. Storage and operating costs are reduced by this storage method but still allows you to search the data, albeit with a slower response. Elastic improves the search experience by retrieving minimal data pieces necessary for a query. For more information about the Frozen tier and how to query it, see this Elastic blog post.
Painless expressions can match IPv4 and IPv6 addresses automatically against Classless Inter-Domain Routing (CIDR) ranges. When a range is defined, you can use the painless contains script to determine if the input IP address falls within the range. This is very useful for grouping and classifying IP addresses when using Elastic for security and monitoring.
Runtime fields are fields in your document that are formed based on the source context. These fields can be defined by the search query or by the index mapping itself. Defining the field in the index mapping will give better performance.
Runtime fields are helpful when you need to search based on some calculated value. For example, if you have internal error codes, you may want to return specific text related to the code. Without storing the associated text, a runtime field can be used to translate a numerical code to an associated text string with a runtime field.
Further, the runtime can be updated by reindexing the document with a newly formed runtime field. This makes updating much more straightforward than having to update each document with new text.
Aliases for Elasticsearch indices have been present since version 1. They are a convenient way to allow functions to point to different data sets independent of the index name. For example, you may wish to have versions of your index. Using an alias, you can always fetch whatever version of the data has a logical value by assigning it an alias.
Aliases are now also available for trained models. Trained models are machine learning algorithms that have been run against a sample set. The existing, known data have trained the output algorithm to give some output. This algorithm can then be applied to new, unknown data, theoretically classifying it in the same way expected for known data. Standard algorithms may include classification analysis or regression analysis.
Elastic now allows you to apply an alias to your trained models like you could already do for indices. The new model_alias API, allows users to insert and update aliases on your trained models. This alias can make it easier to apply specific algorithms for data sets by allowing users to logically alias the machine learning algorithms.
Event Query Language (EQL) is a language explicitly used for searching event time-based data. Typical uses include log analytics, time-series data processing, and threat detection.
In Elasticsearch 7.13.0, developers added the fields parameter as an alternative to the _source parameter. The fields option extracts values from the index mapping while _source accesses the original data sent at index time. The fields option is recommended by Elastic because it:
Log analytics on Elastic can be tricky to set up. Third-party tools, like Coralogix’s log analytics platform, exist to help you analyze data without any complex setup.
Elasticsearch can log security-related events if you have a paid subscription account. Audit events provide logging of different authentication and data access events that occur against your data. The logs can be used for incident responses and demonstrating regulatory compliance. With all the events available, the logs can bog down performance due to the volume of logs and amount of data.
In Elastic Version 7.13, Elastic introduced audit events ignore policies, so users can choose to suppress logging for certain audit events. Setting the ignore policies involves creating rules with match audit events to ignore and not print.
The terms aggregation speed has been improved under certain circumstances. These are common to time series and particularly when the data is in cold or frozen storage tiers. The following are cases where Elastic has improved aggregation speed:
The Elasticsearch Grok parser contained a vulnerability that nefarious users could exploit to produce a denial of service attack. Users with arbitrary query permissions could create Grok queries that would crash your Elasticsearch node. This security flaw is present in all Elasticsearch versions before 7.13.3.
Elasticsearch uses analyzers to determine when a document matches search criteria. Analyzers are used to search for text fields in your index. In version 7.12, a bug was introduced where Elasitcsearch would use the default analyzer (a standard analyzer) on all searches.
According to documentation, the analyzer configured in the index mapping should be used, with the default only being used if none was configured. In version 7.13, this bug was fixed, so the search is configured to use the index analyzer preferentially.
Composite aggregations are used to compile data into buckets from multiple sources. Typical uses of this analysis would be to create graphs from a compilation of data. Graphs may also include time as a method to collect data into the same set. If the user required a timezone to be applied, Elasticsearch behaved incorrectly when stored times were Epoch.
Epoch datetimes are always listed in UTC. Applying a timezone requires formatting the date which was not previously applied internally in Elasticsearch. This bug was resolved in version 7.13.3
SQL queries can use literal selections in combination with filters to select data. For example, the following statement uses a literal selection genre and a filter record:
SELECT genre FROM music WHERE format = ‘record’
Elasticsearch was optimizing to use a local relation in error. This error caused only a single record to be returned even if multiple records match the filter. Version 7.13.3 fixed this issue which was first reported in November 2020.
Elastic pushed up many new features, bug fixes, and enhancements in version 7.13 and has continued to apply small changes through version 7.13.3. The significant new features of note support a frozen storage tier, including ignoring policies for audit events and index runtime fields.
For a complete list of new features, see the Elastic release notes.