Back

AWS OpenSearch Service: Quick Guide and Tutorial

Coralogix Team Apr 04, 2024

8 mins read

Key Features of AWS OpenSearch

OpenSearch offers the following features and capabilities:

Built-in search capabilities: Enables users to perform various search operations such as text search, faceted search, and geospatial search. OpenSearch leverages Apache Lucene’s powerful search functionalities, allowing for complex queries and aggregations.
Data Prepper: Enables the transformation and enrichment of log and trace data before it is indexed. This includes operations like filtering, modification, and aggregation of data. Data Prepper is designed to improve the quality and usefulness of data by preparing it to enhance search and analysis operations.
Trace analytics: Allows for the collection, visualization, and analysis of trace data, helping developers better understand and optimize application performance. It provides insights into latency issues and application dependencies.
Application analytics: Provides tools for analyzing application logs and metrics in real-time. Developers and operations teams can gain insights into application performance, identify issues, and understand user behavior. Users can create custom dashboards and visualizations to monitor the health and performance of their apps.
Event analytics: Allows users to analyze and visualize event data in real-time. This includes exploring data patterns, identifying trends, and detecting anomalies across various event data sources.
Anomaly detection: Utilizes machine learning algorithms to identify patterns in data that do not conform to expected behavior. This feature is critical for use cases such as fraud detection, security monitoring, and operational issues, allowing for timely alerts and actions on potential problems.

OpenSearch OSS Version vs. Amazon OpenSearch Service vs. Amazon Opensearch Serverless

The open source version of OpenSearch is a project that evolved from Elasticsearch 7.10, after Elasticsearch moved to a non-open source license. It’s free to use and can be deployed on any infrastructure, providing users with the flexibility to manage and scale their search and analytics workloads as they see fit. However, managing OpenSearch OSS Version requires significant operational expertise, including provisioning infrastructure, configuring the software, and ensuring security and high availability.

Amazon OpenSearch Service is a managed service offered by AWS that simplifies the deployment, management, and scaling of OpenSearch clusters in the cloud. It automates time-consuming tasks such as hardware provisioning, software installation, and patching. Additionally, it provides features such as automated backups, monitoring, and security settings, including encryption at rest and in transit.

Amazon OpenSearch Serverless is a fully managed, serverless option for running OpenSearch workloads. It abstracts away all the infrastructure management tasks, allowing users to focus solely on their search and analytics workloads without worrying about provisioning, scaling, or managing clusters. It automatically scales up or down based on the workload, making it cost-effective for variable traffic patterns.

Learn more in our detailed guide to OpenSearch Serverless

Note: In the remainder of this article, we focus on the Amazon OpenSearch Service.

Quick Tutorial: Getting Started with Amazon OpenSearch Service

The code examples in this tutorial were adopted from the official OpenSearch service tutorial.

Step 1: Creating an OpenSearch Service Domain

An Amazon OpenSearch Service domain, or OpenSearch cluster, represents a collection of settings, instance types, instance counts, and storage resources. To create one, start by navigating to the AWS Management Console. Under the Analytics section, select Amazon OpenSearch Service, then click on Create domain, and give your domain a name.

During creation, select the Standard create option for detailed configuration. This method is recommended for development and testing, although an Easy create option is available for quickly setting up a production domain. Choose the following options:

Under templates, select Dev/test.
Under deployment option, select Domain with standby to ensure reliability.
Select the OpenSearch version you want to run.
Under network settings, opt for Public access.
Set your access policy to Only use fine-grained access control and set master admin credentials (we’ll skip more advanced configurations like SAML or Amazon Cognito authentication).

Complete domain creation and wait for the initialization process, which can take from 15 to 30 minutes depending on your configuration. After initialization, in the General information section, note the domain’s endpoint URL and copy it for use in the following steps.

Step 2: Uploading Data to OpenSearch Service for Indexing

To populate your OpenSearch Service domain with data, you can use various methods, including command line tools or programming languages. This tutorial demonstrates using curl. Note that clients like curl require the fine-grained access control setting (which we configured in the previous step).

For uploading a single document, use the curl command to put a JSON formatted movie record into the relevant domain. The example below assumes that the domain name is Movies. Here’s an example command:

curl -XPUT -u 'master-user:master-user-password' 'domain-endpoint/movies/_doc/1' -d '{"director": "Spielberg, Steven", "genre": ["Action","Sci-Fi"], "year": 1993, "actor": ["Jeff Goldblum","Laura Dern","Sam Neill"], "title": "Jurassic Park"}' -H 'Content-Type: application/json'

Replace master-user:master-user-password with your created credentials and domain-endpoint with your domain’s endpoint.

To upload multiple documents, prepare a file named bulk_movies.json containing multiple JSON records, each representing a movie. Use the curl command to post this bulk data to your domain:

curl -XPOST -u 'master-user:master-user-password' 'domain-endpoint/_bulk' --data-binary @bulk_movies.json -H 'Content-Type: application/json'

Ensure to replace placeholders with your specific domain details as before.

Step 3: Searching Documents in OpenSearch

For document search, you can utilize the OpenSearch search API or OpenSearch Dashboards. To search from the command line, use a curl GET request:

curl -XGET -u 'master-user:master-user-password' 'domain-endpoint/movies/_search?q=jurassic&pretty=true'

This example searches for the term jurassic within the Movies index.

Alternatively, OpenSearch Dashboards provides a user-friendly interface for data search and visualization. Access it through your domain’s Dashboards URL, log in with your master user credentials, and create an index pattern named Movies. Use this pattern to navigate and search through your indexed data, experimenting with different search terms to explore the data you’ve uploaded.

Related content: Read our guide to OpenSearch Dashboards

Best Practices for Amazon OpenSearch Service

The following best practices can help you get the most out of the AWS OpenSearch Service.

Implement a Shard Strategy

A shard strategy helps in optimizing the performance and scalability of your AWS OpenSearch cluster. Shards break down an index into smaller, more manageable pieces, impacting search and indexing efficiency. The decision on the number of primary shards should take into account the size of your data and your query throughput.

While you can dynamically change the number of replicas, the primary shard count is fixed once an index is created, typically aiming for up to 50GB of data per shard. Replica shards, essential for redundancy and increasing data availability, also enable parallel processing of search queries. It’s advisable to maintain at least one replica per primary shard.

For large datasets, splitting indices based on time or logical partitions can enhance performance and manageability, especially for time-series data. Regularly reviewing and adjusting your sharding strategy is important as your data volume changes.

Optimize the Bulk Request Size and Compression

Optimizing the size of bulk requests and enabling compression can enhance data ingestion efficiency in AWS OpenSearch. The appropriate bulk request size varies with your specific data and network conditions. Starting with a range between 5-15 MB per request balances efficiency and the risk of timeouts or rejections.

Enabling HTTP compression for data transmission can accelerate transfer rates and decrease bandwidth usage, benefiting the client and OpenSearch cluster. Monitoring the performance of your bulk requests through OpenSearch Dashboards or APIs is essential. This allows you to adjust the settings based on observed system strain, such as increased latency or error rates.

Enable Dedicated Master Nodes

Employing dedicated master nodes in your AWS OpenSearch cluster promotes stability and boosts performance. These nodes take on cluster management tasks, such as tracking cluster membership and deciding shard allocations, separating these functions from data processing tasks.

This delineation ensures that management processes do not compete with data processing for resources, thereby enhancing cluster stability. AWS recommends configuring three dedicated master nodes for production environments to achieve high availability and fault tolerance, preferably distributed across different availability zones.

Enable Auto-Tune

Auto-Tune in AWS OpenSearch is a feature designed to automatically optimize your cluster’s performance, reducing the need for manual tuning. By monitoring the cluster and adjusting settings such as JVM memory pressure and shard allocation, Auto-Tune helps adapt your setup to best serve your current workload.

This automation significantly cuts down on the operational overhead, making it a recommended feature for production clusters to maintain optimal performance with minimal manual intervention. Auto-Tune can be enabled through the AWS Management Console or the OpenSearch Service API.

Use the Latest Generation Instance Types

Opting for the latest generation Amazon EC2 instance types when setting up your AWS OpenSearch cluster offers advantages in performance and cost-efficiency. Newer instances typically provide superior compute, memory, and network performance, benefiting indexing and search operations in OpenSearch.

These latest generations also tend to be more cost-effective, delivering higher processing power and throughput at lower costs than older models. Ensuring your cluster runs on updated and supported hardware by choosing the latest instance types also future-proofs your setup.

From OpenSearch to Coralogix

Explore the benefits of Coralogix and how easy it is to migrate from OpenSearch to Coralogix.

Learn more about the Coralogix platform

On this page