Consider for a moment that you are building a webpage that displays data stored in Elasticsearch. You have so much information in your index that your…
Elastic released a major version of its platform on February 10, 2022. Version 8.0.0 is the latest major version. There has already been a new minor release to version 8.1.0, and there are anticipated minor and patch releases coming as Elastic rolls out new features and fixes. The latest release is the first significant revision since April 2019, when version 7.0.0 was generally available. Users can find a complete list of release notes on the Elastic website. This article will highlight new features, breaking changes, and bugs.
The most important note to highlight in version 8.0 is that Elastic has added support for version 7 headers in the REST API calls. Version 8.0 introduced breaking changes to REST headers and responses, but users can upgrade the version without changing their entire code base immediately.
To upgrade the version, Elasticsearch requires users to upgrade to the latest version 7 release (7.17) and enable the REST compatibility using the Accept and Content-Type headers. After reviewing and resolving all critical issues listed in the Upgrade Assistant, upgrade to Elasticsearch 8.0.0. After upgrading your clients to version 8.x, only minor compatibility issues should need to be resolved.
These compatibility mode settings are not meant as a permanent fixture in your platform but rather as a way to smooth out the upgrade process to version 8. Elastic does not guarantee ongoing maintenance of this mode. One further note is that Elastic prevents downgrading from version 8.x back to version 7.x since these are untested downgrades and may break your implementation.
Recommendation engines can be implemented using kNN search vectors powered by Elasticsearch. Version 7.x of Elasticsearch included kNN searching using the script_score field. This method guarantees accurate results, but accuracy comes at the cost of speed and scaling.
Elasticsearch 8.x adds the dense_vector field. This new field allows users to run approximate kNN searches on larger datasets faster than script_score. The searches do return less accurate results.
PyTorch is a machine learning framework that leverages tensor software to build and train models. Elasticsearch 8.x allows users to upload machine learning models trained in PyTorch and use them for natural language processing (NLP). After deploying a model written in TorchScript to a cluster, users can make predictions against incoming data and perform operations based on the results. The elastic stack supports text classification, embedding, named entity recognition, and other search use cases.
Elasticsearch uses Apache’s log4j2 for JSON logs. In version 8.0, the configuration was updated to use EcsLayout instead of ESJsonLayout. Previous versions will not have breaks in the logging because the previous infrastructure was retained. The change in the setting will affect some of the ES JSON logs:
When Elasticsearch changes Metricbreat to writing logs in ECS compliant format, it will stop supporting the legacy format. To support legacy formats from Elasticsearch 7 and earlier, new mappings were added with the new ECS fields for indexing data. These mappings include alias fields for the legacy format so it can point to the corresponding, new ECS fields. Four new mappings were created to march for Metricbeat, Elasticsearch, Kibana, and Logstash logs.
Elasticsearch is built on top of the Lucene Java library. Elasticsearch provides a convenient REST API on top of Lucene to make it easy for users to interact with Lucene’s search features. With Elasticsearch version 8.0, the team has upgraded to Lucene Version 9. At its latest minor release, Elasticsearch version 7 was still running off of Lucence Version 8.
Lucene version 9 includes new language features for Japanese, Swedish, Serbian, and other languages. Lucene also added support for high-dimensionality numeric vectors in kNN searches.
Lucene version 9 also added several optimizations that will translate to Elasticsearch. These include faster taxonomy faceting, faster indexing of multi-dimensional points, and faster sorting of fields indexed as points.
The new version of Elasticsearch will only start if all indices created on the cluster were created at a minimum in Elasticsearch version 7.0. The cluster will not start otherwise.
If you have an index with an unsupported version, you can use the reindex command to carry the index forward with a new version.
Snapshot repositories are used to store backups of your Elasticsearch cluster to protect your data. If the Elasticsearch cluster is corrupted, snapshots can be used to recover the data.
In previous versions of Elasticsearch, plugins had to be installed for each of the snapshot repositories available: Amazon S3, Google Cloud Storage, and Microsoft Azure Blob Storage. Users no longer need to install the plugins because they are included in the Elasticsearch library by default. The CLI has been updated to warn users of the change. In the future, the CLI will show an error.
Several REST endpoints were changed or removed as part of the upgrade to version 8.0. Many of these changes were deprecated in Elasticsearch version 7 but have now been removed altogether. This changes Elasticsearch’s responses from warnings to errors.