Monitoring-as-Code for Scaling Observability
Monitoring as Code, also called Observability as Code, is a method of automating the configuration of observability tools using code. It includes tasks like cloud resource…
Whenever you build a service and expose a set of endpoints to provide API access to that service, you’ll likely need to track their availability and response times, aside from ensuring their functionality. But to actually know that “something is down” or just “not performing” you need to consistently monitor your services day in day out and that’s how Heartbeat from the Elastic Beat family helps you with Uptime Monitoring.
Heartbeat helps you monitor your service availability. It works by defining Monitors that check your host to ensure they’re alive.
When discussing Monitors, there are three main monitor types to consider. Each one refers to the underlying protocols that’s utilized for the monitor. Each of these protocols operate at a different network level and thus each has varying options of what it can check.
So let’s go one by one and explore them in more detail:
Now that we know the toolkit at our disposal, let’s dive deeper and see it in action!
First we need to start with the installation of Heartbeat. It is a very similar process to the installation of Elasticsearch, but let’s reiterate the main steps here.
We’ll use the APT repositories to do this, so we need to install and add the public signing key. This command should result in an OK confirmation:
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
Next install apt-transport-https so that we can use https exposed repositories:
sudo apt-get install apt-transport-https
And now let’s add the Elasticsearch apt repository to our list of available repositories:
echo "deb https://artifacts.elastic.co/packages/oss-7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list
Finally you can install Heartbeat with apt-get in a desired version:
sudo apt-get update && sudo apt-get install heartbeat-elastic=7.5.2
As an option, you can set the Heartbeat to start with the boot of the system like this:
sudo systemctl enable heartbeat-elastic
Now that we have the prerequisites covered we should review the main configuration file for Heartbeat:
sudo vim /etc/heartbeat/heartbeat.yml
Here you’ll find a ton of options, but don’t worry we’ll manage to get by leaving most of them on their defaults. First and foremost, let’s define Heartbeat’s output which is the Elasticsearch host. Here we are ok with the default as we are running on localhost:9200
#-------------------------- Elasticsearch output ------------------------------ output.elasticsearch: # Array of hosts to connect to. hosts: ["localhost:9200"]
Pro Tip: In production setups you’ll likely need to pay attention to the SSL and authentication sections.
The second part is the path to the Monitors directory. Although you can define your Monitors straight in the heartbeat.yml file it’s not a very good idea as it can get messy. So it is better to have them separated in a defined directory where every yaml file (*.yml) will get picked up.
For the configuration we’ll just enable Monitor reloading by setting it to true.
############################# Heartbeat ###################################### # Define a directory to load monitor definitions from. Definitions take the form # of individual yaml files. heartbeat.config.monitors: # Directory + glob pattern to search for configuration files path: ${path.config}/monitors.d/*.yml # If enabled, heartbeat will periodically check the config.monitors path for changes reload.enabled: true # How often to check for changes reload.period: 5s
Great! Now all potential changes will automatically be reloaded every 5 seconds without the need to restart Heartbeat.
Lastly to have our config file clean. Comment out everything under the field heartbeat.monitors. We will define our monitors separately.
#heartbeat.monitors: #- type: http #urls: ["http://localhost:9200"] #schedule: '@every 10s'
We can now start up our Heartbeat instance like this:
sudo systemctl start heartbeat-elastic.service
If you want to watch the logs of Heartbeat to be sure everything went smoothly, you can do so with journalctl utility (if you are running Heartbeat in systemd).
sudo journalctl -u heartbeat-elastic.service -f
Everything should be ready to define our first Monitor!
Let’s start easy and create a simple Monitor of the TCP type.
For the lack of a better shared option in our local environment, we’ll perform our tests against the Elastic stack running in our vm.
We will start by changing our monitors.d directory and creating a new yml file like this:
cd /etc/heartbeat/monitors.d/ sudo vim lecture-monitors.yml
Let’s define a very basic TCP monitor:
- id: elasticstack-tcp-monitor type: tcp hosts: ["localhost"] ports: [9200, 5601] schedule: '@every 10s'
The key parts are:
And that’s it, we have our first Heartbeat monitor!
Now save your yaml file and just wait a while 🙂
:wq
Then if you request the list of your indices…
curl localhost:9200/_cat/indices/heartbeat*?v
… you should see a new heartbeat index and that it already has some documents.
We can quickly review them with the following query.
curl --silent localhost:9200/heartbeat-7.5.2/_search | jq -c '.hits.hits[]._source | [ .url.full, .monitor.status ]'
With the jq utility, we are just unpacking two fields from the search query the url.full field (for the host:port combination) and monitor.status.
As you can see both Elasticsearch and Kibana seem to be up, or in other words they can be connected to:
Now we can test out the HTTP monitor, which will likely be the one used with your set of HTTP/REST services. Generally you need to define what it means for your specific service to be “alive and well” in order to design the Monitor properly.
It may be a specific status code or JSON response with specific contents, or all of these conditions combined.
To try this let’s define a Monitor that will watch the _cluster/health endpoint of our Elasticsearch cluster. It is a good example of a “status” endpoint.
- id: elasticsearch-cluster-health type: http urls: ["http://localhost:9200/_cluster/health"] schedule: '@every 10s' check.request: method: GET check.response: status: 200 json: - description: check status condition: equals: status: green
You can see that it is fairly similar to our TCP one we did earlier, but besides the different type, it also adds some extra parameters:
You can combine multiple conditions with or, and, not and you can perform various checks such as equals, contains, regexp, etc.
Let’s save the yaml file, and wait a bit.
:wq
Afterwards, you can check the monitoring data points. For example, the last 30 seconds with the following command:
curl -g --silent localhost:9200/heartbeat-7.5.2/_search?q=@timestamp:["now-30s"+TO+"now"] | jq -c '.hits.hits[]._source | [ .url.full, .monitor.status ]'
Note: it is the same query as before but it adds a time condition. Also, notice the -g flag for curl which allows us to use the square brackets in the query.
This should now yield the results from both of our Monitors. And as you can see our HTTP monitor is informing us that the service is down… you can think why that should be 🙂
Hint: We have some unassigned replicas which you can resolve with by changing the dynamic settings to index.number_of_replicas: 0
curl --request PUT 'http://localhost:9200/date*/_settings' \ --header 'Content-Type: application/json' \ --data-raw '{ "index.number_of_replicas" : 0 }'
Finally we can also try visualizing the collected data. To save some initial setup work we will use a predefined Heartbeat dashboard that is available as an open source project in this github repo.
You can download the config file with wget:
wget https://raw.githubusercontent.com/elastic/uptime-contrib/master/dashboards/http_dashboard.json
Now let’s go to Management → Kibana / Saved Objects → Import, find the downloaded JSON file and import it.
In the Saved Objects you can see what was created via the configuration file. It is a set of visualizations in a dashboard, and importantly, an index pattern that is an interface for our data in the heartbeat-* indices.
Now go to the Uptime section (in the left menu of Kibana) and pick the Heartbeat HTTP monitoring dashboard.
The out-of-the-box dashboard should look something like this. It shows the distribution of status codes, round-trip times of requests and other related data.
Not bad at all, if you want to dazzle your colleagues after only 5 minutes of work!
If you need to add or tweak the individual visualizations you can do so in the Visualize section. Also remember that in the Discover section you can inspect the raw data points.
Now, as a final step you should stop the Heartbeat instance like this:
sudo systemctl stop heartbeat-elastic.service
… and optionally (if short on space) remove its indices to have your table clean :).
curl --request DELETE 'http://localhost:9200/heartbeat*'
Very good! Now you know how to keep the availability and response times of your services under control and how to quickly visualize the collected data to get valuable insights.