Our next-gen architecture is built to help you make sense of your ever-growing data. Watch a 4-min demo video!

OpenSearch with Python: The Basics and a Quick Tutorial

  • 6 min read

What Is OpenSearch?

OpenSearch is a community-driven, open-source search and analytics suite. Originally a fork of Elasticsearch and Kibana, it was created to ensure the community retains an open-source alternative for indexing and searching large datasets.

OpenSearch makes it possible to store, search, and visualize data in real time, scaling up to handle extensive datasets without performance degradation. It covers various applications, including log analytics, business intelligence, and full-text search.

Its features include a RESTful API, integrated security, alerting, and machine learning capabilities, making it suitable for operational and business use cases. OpenSearch supports customization, along with community-contributed plugins and extensions that can further enhance its functionalities.

In this article, you will learn:

OpenSearch Python Clients

There are several clients that can be used to interface with OpenSearch in Python.

Low-Level Python Client

The low-level Python client, opensearch-py, serves as an interface to the OpenSearch REST API, enabling more natural interactions with an OpenSearch cluster within a Python environment. Instead of sending raw HTTP requests, users can create an OpenSearch client instance and utilize built-in functions to perform various operations. This approach streamlines the process of managing OpenSearch clusters and executing API requests.

Machine Learning Python Client

opensearch-py-ml is a specialized Python client designed to enhance data analytics and natural language processing (NLP) capabilities within OpenSearch. This client allows data analysts to leverage the following features:

  • DataFrame APIs: Wraps OpenSearch indexes into an API that mirrors the functionality of Pandas DataFrames, enabling the manipulation of large datasets from OpenSearch within environments such as Jupyter Notebooks.
  • NLP model integration: Users can upload NLP models, specifically SentenceTransformer models, into OpenSearch using the ML Commons plugin.
  • Model training and tuning: Supports the training and tuning of SentenceTransformer models using synthetic queries, providing tools for NLP tasks.

High-Level Python Client (Deprecated)

The high-level Python client for OpenSearch, known as opensearch-dsl-py, offers convenient wrapper classes for handling OpenSearch entities, such as documents, as Python objects. This client simplifies the process of writing queries and provides accessible Python methods for frequent OpenSearch tasks, such as creating, indexing, and updating documents, as well as conducting searches with and without filters.

However, it is important to note that opensearch-dsl-py will be deprecated after version 2.1.0. Users are encouraged to transition to the low-level Python client, opensearch-py, which has incorporated the functionalities of the high-level client.

Tutorial: Getting Started with OpenSearch and Python

In this tutorial, we’ll walk through the steps to set up OpenSearch with Python, including how to connect to an OpenSearch cluster, perform basic operations such as creating an index, indexing a document, performing bulk operations, searching for documents, and deleting documents and indexes. The tutorial steps are adapted from the official documentation.

Setup

First, install the OpenSearch Python client using pip:

pip3 install opensearch-py

Once installed, import the client in with the following Python script:

from opensearchpy import OpenSearch

Connect to OpenSearch

To connect the client to the OpenSearch host, create a client object. If using the Security plugin, enable SSL and provide authentication details. Here’s an example:

host = 'localhost'
port = 9200
auth = ('admin', 'admin')  
ca_certs_path = '/full/path/to/root-ca.pem'  
client = OpenSearch(
    hosts=[{'host': host, 'port': port}],
    http_compress=True,
    http_auth=auth,
    use_ssl=True,
    verify_certs=True,
    ssl_assert_hostname=False,
    ssl_show_warn=False,
    ca_certs=ca_certs_path
)
# Print out a debug print to see if connection is successful
print( client.info() )

If using the organization’s client certificates, include them as follows:

client_cert_path = '/full/path/to/client.pem'
client_key_path = '/full/path/to/client-key.pem'
client = OpenSearch(
    hosts=[{'host': host, 'port': port}],
    http_compress=True,
    http_auth=auth,
    client_cert=client_cert_path,
    client_key=client_key_path,
    use_ssl=True,
    verify_certs=True,
    ssl_assert_hostname=False,
    ssl_show_warn=False,
    ca_certs=ca_certs_path
)
# Print out a debug print to see if connection is successful
print( client.info() )

For connections without the Security plugin, disable SSL:

client = OpenSearch(
    hosts=[{'host': host, 'port': port}],
    http_compress=True,
    use_ssl=False,
    verify_certs=False,
    ssl_assert_hostname=False,
    ssl_show_warn=False
)
# Print out a debug print to see if connection is successful
print( client.info() )

Opensearch Python 1

Connect to Amazon OpenSearch Service

Before proceeding, please make sure:

  • Boto3 library is installed, if not run the following command: pip3 install boto3
  • aws-cli is installed and you have added an App ID and Key Secret
  • You have edited OpenSearch Security settings to allow your AWS user (whose credentials are pulled in via boto3). To do this, navigate to AWS OpenSearch Console > Domains > [Your Domain].
Opensearch Python 2

To connect to Amazon OpenSearch Service, use AWS credentials and specify the connection details:

from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth
import boto3
host = 'my-example-domain.us-east-1.es.amazonaws.com'
region = 'us-east-1'
service = 'es'
credentials = boto3.Session().get_credentials()
auth = AWSV4SignerAuth(credentials, region, service)
client = OpenSearch(
    hosts=[{'host': host, 'port': 443}],
    http_auth=auth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
    pool_maxsize=30
)
# Check if above command succeeded or not
        print( client.info() )

Opensearch Python 3

The Policy document explicitly denies access by default, please allow for it for your IAM user by using the following JSON:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::XXXXXXXX:user/YOUR_USER"
      },
      "Action": "es:*",
      "Resource": "arn:aws:es:us-east-1:XXXXXXXXX:domain/myostestagile/*"
    }
  ]
}

Create an OpenSearch Index

To create an index, use a script similar to this:

index_name = 'python-example-index'
index_body = {
    'settings': {
        'index': {
            'number_of_shards': 5
        }
    }
}
response = client.indices.create(index_name, body=index_body)
print('Creating index:', response)

Opensearch Python 4

Index a Document

To index a document, use the client.index() method. For example:

document = {
    'title': 'To Kill a Mockingbird',
    'author': 'Harper Lee',
    'year': '1960'
}
response = client.index(
    index='python-example-index',
    body=document,
    id='1',
    refresh=True
)
print('Adding document:', response)

Opensearch Python 5

Perform Bulk Operations

For bulk operations, use the client.bulk() method. This supports multiple simultaneous operations, which can be of the same or different types. To separate operations, use \n. For example:

books = '{ "index" : { "_index" : "example-dsl-index", "_id" : "1" } } \n { "title" : "To Kill a Mockingbird", "author" : "Harper Lee", "year" : "1960"} \n { "create" : { "_index" : "example-dsl-index", "_id" : "2" } } \n { "title" : "1984", "author" : "George Orwell", "year" : "1949"} \n { "update" : {"_id" : "2", "_index" : "example-dsl-index" } } \n { "doc" : {"year" : "1950"} }'

client.bulk(body=books)

Opensearch Python 6

Search for Documents

To search for documents, create a query using the client.search() method:

q = 'Harper Lee'
query = {
    'size': 5,
    'query': {
        'multi_match': {
            'query': q,
            'fields': ['title^2', 'author']
        }
    }
}
response = client.search(
    body=query,
    index='python-example-index'
)
print('Search results:', response)

Delete a Document

The client.delete() method can be used to delete documents:

response = client.delete(
    index='python-example-index',
    id='1'
)
print('Deleting document:', response

Opensearch Python 8

Delete an Index

To delete an index, use the client.indices.delete() method:

response = client.indices.delete(
    index='python-example-index'
)
print('Deleting index:', response)

Opensearch Python 9

Managed Application Observability with Coralogix

Coralogix sets itself apart in observability with its modern architecture, enabling real-time insights into logs, metrics, and traces with built-in cost optimization. Coralogix’s straightforward pricing covers all its platform offerings including APM, RUM, SIEM, infrastructure monitoring and much more. With unparalleled support that features less than 1 minute response times and 1 hour resolution times, Coralogix is a leading choice for thousands of organizations across the globe.

Learn more about the Coralogix platform

Observability and Security
that Scale with You.