AWS OpenSearch is a managed service based on Elasticsearch, Amazon’s fork of the last open source version of the popular Elasticsearch platform. It is a managed big data analytics service that makes it possible to search and analyze large volumes of data quickly.
With AWS OpenSearch, users can benefit from the capabilities of Elasticsearch and Kibana, with additional enhancements and enterprise-grade features added by Amazon, and without needing to manage the underlying infrastructure. It is compatible with open-source APIs, which means existing tools and applications can work seamlessly with the service.
OpenSearch offers the following features and capabilities:
The open source version of OpenSearch is a project that evolved from Elasticsearch 7.10, after Elasticsearch moved to a non-open source license. It’s free to use and can be deployed on any infrastructure, providing users with the flexibility to manage and scale their search and analytics workloads as they see fit. However, managing OpenSearch OSS Version requires significant operational expertise, including provisioning infrastructure, configuring the software, and ensuring security and high availability.
Amazon OpenSearch Service is a managed service offered by AWS that simplifies the deployment, management, and scaling of OpenSearch clusters in the cloud. It automates time-consuming tasks such as hardware provisioning, software installation, and patching. Additionally, it provides features such as automated backups, monitoring, and security settings, including encryption at rest and in transit.
Amazon OpenSearch Serverless is a fully managed, serverless option for running OpenSearch workloads. It abstracts away all the infrastructure management tasks, allowing users to focus solely on their search and analytics workloads without worrying about provisioning, scaling, or managing clusters. It automatically scales up or down based on the workload, making it cost-effective for variable traffic patterns.
Learn more in our detailed guide to OpenSearch Serverless
Note: In the remainder of this article, we focus on the Amazon OpenSearch Service.
The code examples in this tutorial were adopted from the official OpenSearch service tutorial.
An Amazon OpenSearch Service domain, or OpenSearch cluster, represents a collection of settings, instance types, instance counts, and storage resources. To create one, start by navigating to the AWS Management Console. Under the Analytics section, select Amazon OpenSearch Service, then click on Create domain, and give your domain a name.
During creation, select the Standard create option for detailed configuration. This method is recommended for development and testing, although an Easy create option is available for quickly setting up a production domain. Choose the following options:
Complete domain creation and wait for the initialization process, which can take from 15 to 30 minutes depending on your configuration. After initialization, in the General information section, note the domain’s endpoint URL and copy it for use in the following steps.
To populate your OpenSearch Service domain with data, you can use various methods, including command line tools or programming languages. This tutorial demonstrates using curl. Note that clients like curl require the fine-grained access control setting (which we configured in the previous step).
For uploading a single document, use the curl command to put a JSON formatted movie record into the relevant domain. The example below assumes that the domain name is Movies. Here’s an example command:
curl -XPUT -u 'master-user:master-user-password' 'domain-endpoint/movies/_doc/1' -d '{"director": "Spielberg, Steven", "genre": ["Action","Sci-Fi"], "year": 1993, "actor": ["Jeff Goldblum","Laura Dern","Sam Neill"], "title": "Jurassic Park"}' -H 'Content-Type: application/json'
Replace master-user:master-user-password with your created credentials and domain-endpoint with your domain’s endpoint.
To upload multiple documents, prepare a file named bulk_movies.json containing multiple JSON records, each representing a movie. Use the curl command to post this bulk data to your domain:
curl -XPOST -u 'master-user:master-user-password' 'domain-endpoint/_bulk' --data-binary @bulk_movies.json -H 'Content-Type: application/json'
Ensure to replace placeholders with your specific domain details as before.
For document search, you can utilize the OpenSearch search API or OpenSearch Dashboards. To search from the command line, use a curl GET request:
curl -XGET -u 'master-user:master-user-password' 'domain-endpoint/movies/_search?q=jurassic&pretty=true'
This example searches for the term jurassic within the Movies index.
Alternatively, OpenSearch Dashboards provides a user-friendly interface for data search and visualization. Access it through your domain’s Dashboards URL, log in with your master user credentials, and create an index pattern named Movies. Use this pattern to navigate and search through your indexed data, experimenting with different search terms to explore the data you’ve uploaded.
Related content: Read our guide to OpenSearch Dashboards
The following best practices can help you get the most out of the AWS OpenSearch Service.
A shard strategy helps in optimizing the performance and scalability of your AWS OpenSearch cluster. Shards break down an index into smaller, more manageable pieces, impacting search and indexing efficiency. The decision on the number of primary shards should take into account the size of your data and your query throughput.
While you can dynamically change the number of replicas, the primary shard count is fixed once an index is created, typically aiming for up to 50GB of data per shard. Replica shards, essential for redundancy and increasing data availability, also enable parallel processing of search queries. It’s advisable to maintain at least one replica per primary shard.
For large datasets, splitting indices based on time or logical partitions can enhance performance and manageability, especially for time-series data. Regularly reviewing and adjusting your sharding strategy is important as your data volume changes.
Optimizing the size of bulk requests and enabling compression can enhance data ingestion efficiency in AWS OpenSearch. The appropriate bulk request size varies with your specific data and network conditions. Starting with a range between 5-15 MB per request balances efficiency and the risk of timeouts or rejections.
Enabling HTTP compression for data transmission can accelerate transfer rates and decrease bandwidth usage, benefiting the client and OpenSearch cluster. Monitoring the performance of your bulk requests through OpenSearch Dashboards or APIs is essential. This allows you to adjust the settings based on observed system strain, such as increased latency or error rates.
Employing dedicated master nodes in your AWS OpenSearch cluster promotes stability and boosts performance. These nodes take on cluster management tasks, such as tracking cluster membership and deciding shard allocations, separating these functions from data processing tasks.
This delineation ensures that management processes do not compete with data processing for resources, thereby enhancing cluster stability. AWS recommends configuring three dedicated master nodes for production environments to achieve high availability and fault tolerance, preferably distributed across different availability zones.
Auto-Tune in AWS OpenSearch is a feature designed to automatically optimize your cluster’s performance, reducing the need for manual tuning. By monitoring the cluster and adjusting settings such as JVM memory pressure and shard allocation, Auto-Tune helps adapt your setup to best serve your current workload.
This automation significantly cuts down on the operational overhead, making it a recommended feature for production clusters to maintain optimal performance with minimal manual intervention. Auto-Tune can be enabled through the AWS Management Console or the OpenSearch Service API.
Opting for the latest generation Amazon EC2 instance types when setting up your AWS OpenSearch cluster offers advantages in performance and cost-efficiency. Newer instances typically provide superior compute, memory, and network performance, benefiting indexing and search operations in OpenSearch.
These latest generations also tend to be more cost-effective, delivering higher processing power and throughput at lower costs than older models. Ensuring your cluster runs on updated and supported hardware by choosing the latest instance types also future-proofs your setup.
Explore the benefits of Coralogix and how easy it is to migrate from OpenSearch to Coralogix.