Using AWS Timestream for System Health Monitoring

Amazon Web Services (AWS) introduced a preview of Timestream in November 2018 before releasing the full version in October 2020. AWS Timestream is a time series database that can process trillions of events daily. It is faster and less costly than relational databases offered by AWS for processing time-series information.

In this article, we will look at what Timestream can do compared to some other AWS observability tools, and how to use Timestream to help monitor the health of your system.

What Can Timestream Do?

AWS Timestream comes stocked with many benefits for its users. The significant benefits include:

High Performance

Time-series data is commonly used for real-time processing, meaning it needs to be fast. AWS Timestream offers queries claimed to be 1000x faster than relational databases.

Low Cost

Since Timestream is bound to have more events than other types of tracking, it was crucial to keep the cost low. AWS Timestream can cost as little as 1/10th of what a relational database could cost.

Automatic Scaling

Like other AWS services, Timestream is serverless. Users do not need to configure capacity requirements or manage servers. The Timestream database will automatically scale up and down as your capacity changes.

Use SQL Queries with Built-In Functions

Timestream allows users to query data using SQL queries. It comes with some built-in analytics capabilities as well. Users can smooth, interpolate, or approximate data while querying. Timestream also includes more advanced models like aggregation.

Automatic Data Lifecycle Management

AWS Timestream uses two tiers of memory storage: a memory store and a magnetic store. The memory store is faster to access, and Timestream uses it for recent data. The magnetic store is slower to access and used for historical data. 

Simple Data Access

AWS Timestream uses two data stores based on the age of the event. The queries used on the Timestream database will access both data stores without users needing to specify the data’s location. 

Automatic Encryption

AWS automatically encrypts Timestream data. Users can choose to manage their encryption keys or use an AWS managed key. Encryption is applied both for data at rest and in transit.

When should you choose Timestream?

AWS offers many database types, including relational and key-value database options. Whether or not to choose Timestream over these other options depends on your use case. AWS designed Timestream for time series data. It can provide less cost, higher efficiency, and more features than other database types when dealing with time series data.

Time series use cases include:

  • Recording weather patterns over time
  • Logging IoT sensor data from devices over time
  • Monitoring analytics from an app or webpage over time

AWS Timestream stores continuous data for a given measure, logging a time with each event. Each event is also associated with metadata which is available in SQL queries. Events cannot be updated or deleted once written to Timestream since it is an append-only database. 

Timestream vs. DynamoDB

AWS DynamoDB tables store data according to key-value pairs. Users can insert and update information based on timestamps. Each entry into the database can have any number of attributes, and those attributes could even be different for different documents. 

Amazon designed Dynamo’s data structure for fast querying, which Timestream does not match. However, Timestream comes with built-in analytics like interpolation, which can be especially useful for time-series analyses.

Timestream vs. Aurora

Amazon’s Relational Database Services (RDS) databases are a series of tools to set up relational databases like PostgreSQL, MySQL, or Amazon Aurora. Aurora databases are managed by AWS, reaching 128TB in size. Data is replicated multiple times across multiple zones to ensure availability and durability. Users can update and delete information from tables if needed. 

Relational databases have limited capabilities for time-series analysis due to the fact that they cannot store and query data using time intervals. Also, Aurora will back up data to S3 after reaching its size limit, making it unavailable to query using the same SQL commands used for acquiring contents of the table.

Using AWS Timestream to Monitor System Health

Amazon designed AWS Timestream to ingest trillions of time-series events daily. This capacity, along with its built-in analytical functions, positions Timestream as a superior database for collecting system health data.

AWS Timestream Configuration

Timestream configuration includes databases and tables. Databases are a top-level container for data. Users create tables within a database and inject events from one or more related time series into a given table. For system health metrics, create a database for any discrete system (staging, production, development). 

Table definitions are simple since Timestream is serverless and does not require capacity definitions on creation. Users must configure data retention duration for each of the two memory stores: memory store and magnetic store. 

AWS Timestream Data Retention and Memory

Memory store is the short-term data store. AWS also ensures the durability of data by replicating it across multiple availability zones. The duration of the memory store must be at least as much as the time between the event’s timestamp and Timestream ingestion time. 

Loading older data into Timestream can be done, but data timestamps must conform to the memory store limit. Since the memory store retention maximum is one year, only data with timestamps dating back one year may be loaded into Timestream.

The magnetic store is the long-term, low-cost data store. AWS will automatically move data from the memory store to the magnetic store based on your configuration. Data is automatically backed up into S3 to ensure durability.

Queries on Timestream tables will access both memory types automatically without any extra settings from users.

Data Analysis

Loaded data may vary based on the monitored aspect of system health. Users can monitor many aspects by adding more tables to their Timestream. Having a Timestream setup with data in place, users can begin to assess their system’s health. Data may include web page load times, memory utilization, AWS Lambda function durations, or app analytics. Each data set can assign dimensions, or metadata, with the Timestream event for better analytical capabilities. 

Queries use SQL formatting and can include some built-in analysis. Timestream allows users to run interpolations, smoothing functions, and approximation functions. Timestream also added join capabilities to Timestream some months after its general release. Users can choose to export data to a visualization tool such as AWS Managed Grafana.

While Timestream allows straightforward analytics, further analysis may be needed to gain insights into potential service issues. Users can choose to export Timestream data to third-party services like Coralogix’s scalable metrics analytics service to automatically gain insight from gathered data.

Coralogix uses machine learning algorithms to further analyze your Timestream data for deeper insights from your collected metrics than what is available with built-in Timestream functions.

Summary

Timestream is a generally available, serverless database configured for efficient, cost-effective, and scalable time-series logging and analysis. Timestream uses SQL queries with built-in analysis tools to help users monitor system health metrics and other time-series data.

Users can export Timestream data to other services that can provide further insights into your data. Coralogix’s ingestion tools for AWS can be used to send Timestream data to Coralogix via AWS Kinesis or AWS Lambda for more complex analyses.

Grafana Vs Graphite

The amount of data being generated today is unprecedented. In fact, more data has been created in the last 2 years, than in the entire history of the human race. With such volume, it’s crucial for companies to be able to harness their data in order to further their business goals.

A big part of this is analyzing data and seeing trends, and this is where solutions such as Graphite and Grafana become critical.

We’ll look at the 2 solutions, including learning more about each one, their similarities and differences.

Graphite

Graphite was designed and written by Chris Davis in 2006. It started as a side project but ultimately was released under the open source Apache 2.0 license in 2008. It has gone on to gain much popularity and is used by companies such as Booking.com and Salesforce.

It is essentially a data collection and visualization system, and assists teams in visualizing large amounts of data.

Technically, Graphite does 2 main things: it stores numeric time-series data, and it renders graphs of this data on demand

It’s open source, has a powerful querying API, and contains a number of useful features. It has won over fans with its almost endless customization options, it can render any graph, has well-supported integrations, it includes event tracking, and rolling aggregation makes storage manageable.

Anybody who would want to track values of anything over time. If you have a number that could potentially change over time, and you might want to represent the value over time on a graph, then Graphite can probably meet your needs. For example, it would be excellent for use in graphing stock prices, as they are numbers that change over time.

Graphite’s scalability is an asset (Graphite scales horizontally on both the frontend and the backend, so you can simply add more machines to get more throughput); from a few data points, or multiple of performance metrics from thousands of servers, Graphite will be able to handle the task.

Criticisms of Graphite, in general, include difficulty in deployment, issues with scaling, and its graphs not being the most visually appealing.

Graphite has 3 main, distinct components, but due to the fact that Graphite doesn’t actually gather metrics itself (rather, it has metrics sent to it), there is a 4th component, that of Metrics Gathering.

Graphite infra

We’ll take a more in-depth look at the various components of Graphite, their implementations, and alternatives where relevant.

  1. Metrics Gathering: The fact that Graphite does not gather its own metrics is offset by the number of metric gatherers available that deliver metrics in the Graphite format. 
  2. Carbon, which listens for time-series data: Carbon, comprising the Carbon metric processing daemons, is responsible for receiving metrics over the network, and writing them down to disk using a storage backend.

Getting data into Graphite (data is actually sent to the Carbon and Carbon-Relay, which then manage the data) is relatively easy, and comprises 3 main methods: Plaintext, Pickle, and AMQP.

For a singular script, or for test data, the plaintext protocol is the most straightforward. For large amounts of data, batch data up and send it to Carbon’s pickle receiver. Alternatively, Carbon can listen to a message bus, via AMQP, or there are various tools and APIs which can feed this data in for you.

Using the plaintext protocol, data sent must be in the following format: <metric path> <metric value> <metric timestamp>. Carbon translates this line of text into a metric that the web interface and Whisper understand.

With the pickle protocol, there is a much more efficient take on the plaintext protocol, and the sending batches of metrics to Carbon is supported. The general idea is that the pickled data forms a list of multi-level tuples:

[(path, (timestamp, value)), …]

When AMQP_METRIC_NAME_IN_BODY is set to True in your carbon.conf file, the data should be in the same format as the plaintext protocol, e.g. echo “local.random.diceroll 4 date +%s”. When AMQP_METRIC_NAME_IN_BODY is set to False, you should omit ‘local.random.diceroll’.

The following steps should be followed when feeding data into Carbon:

  1. Plan a Naming Hierarchy: Every series stored in Graphite has a unique identifier; decide what your naming scheme will be, ensuring that each path component has a clear and well-defined purpose
  2. Configure your Data Retention: With Graphite being built on fixed-size databases, you have to configure, in advance, how much data you intend storing and at what level of precision.
  3. Understand the Graphite Message Format: Graphite understands messages with the format metric_path value timestampn, where “metric_path” is the metric namespace that you want to populate, “value” is the value that you want to assign to the metric, and “timestamp” is the number of seconds since Unix epoch time.
  4. Whisper – a simple database library for storing time-series data: Graphite has its own specialized database library called whisper, which is a fixed-size database, that provides fast, reliable storage of numeric data over time. Whisper was created to allow Graphite to facilitate visualization of various application metrics that do not always occur regularly, as well as for speed purposes.

graphite user interface

Whisper, while technically slower than RRD (less than a millisecond difference for simple cases), has a number of distinct advantages, including the fact that RRD is unable to make updates to a time-slot prior to its most recent update, and that  RRD was not designed with irregular updates in mind.

  1. Graphite web app – renders graphs on demand: The web app is a Django web app that renders graphs on demand, using the Cairo library. Once data has been fed in and stored, it can now be visualized. Graphite has endured criticism of its front-end visualizations, and there are many tools that can be used that leverage Graphite but provide their own visualizations. One of the most popular of these is Grafana.

Grafana

Grafana is an open source visualization tool, that can be integrated with a number of different data stores, but is most commonly used together with Graphite. Its focus is on providing rich ways to visualize time series metrics.

Connecting Grafana to Graphite data source is relatively easy:

    1. Click the Grafana icon in the top header to open the side menu
    1. Under the Configuration link, find Data Sources
    1. Click the “Add data source” button in the top header
  1. Select Graphite from the dropdown

Grafana enables you to take your graphs to the next level, including charts with smart axis formats, and offers multiple add-ons and features. There is also a large variety of ready-made and pre-built dashboards for different types of data and sources. It’s simple to set up and maintain, is easy to use, and has won much praise for its display style.

Grafana dashboard view

Grafana is traditionally strong in analyzing and visualizing metrics such as memory and system CPU, and does not allow full-text data querying. For general monitoring Grafana is good, however for logs specifically, it is not recommended.

Documentation is excellent, from getting started – which explains all the basic concepts you’ll need to get to grips with the system – to tutorials and plugins. There is even a “GrafanaCon”, a conference with the Grafana team, along with other data scientists and others across the Grafana ecosystem, to gather and discuss monitoring and data visualization.

The place to start is with the “new Dashboard” link, found on the right-hand side of the Dashboard picker. You’ll see the Top Header, with options ranging from adding panels to saving your dashboard.

grafana top bar

With drag-and-drop functionality, panels can be easily moved around, and you can zoom in and out.

Grafana recently launched version 5.0, which includes the following features and updates:

    • New Dashboard Layout Engine: enables an easier drag, drop and resize experience
    • New UX: including big improvements in UI, in both look and function
    • New Light Theme
    • Dashboard Folders: to help keep dashboards organized
    • Permissions on folders
    • Group users into teams
    • Datasource provisioning: makes it possible to set up data sources via config files
  • Persistent dashboard URL’s: now it’s easier to rename dashboards, without breaking links

Differences and Similarities

Graphite has proved itself over time to be a reliable way to collect and portray data. It has its quirks, and many newer solutions are on the market that offers more features or are easier to use, however it has managed to stay relevant and is still preferred by many.

Grafana has been steadily improving its offering, with a number of plugins being available, and is being used by more and more companies.

Graphite is often used in combination with Grafana. Graphite is used for data storage and collection, while Grafana handles the visualization. This way, the best of both worlds (at least in this context) is achieved. Graphite reliably provides the metrics, while Grafana provides a beautiful dashboard for displaying these metrics through a web browser.

Making Data Work For You

Every company, from large to small, is generating significant amounts of extremely useful data. This data can be generated from many sources, such as the use of the company’s product, or from its infrastructure. 

Whatever the data being generated, successful businesses are learning from this data to make successful decisions and monitor their performance. This is where tools like Graphite and Grafana come into play; they enable organizations to monitor their data visually, see macro trends, identify abnormal trends, and make informed decisions.

Tools like Graphite and Grafana are not catch-all solutions, however. Some data – such as logs – require specific tools to enable companies to get the most from their analysis. Coralogix maps software flows, automatically detect production problems and clusters log data back into its original patterns so that hours of data can be viewed in seconds. Coralogix can be used to query data, view the live log stream, and define dashboard widgets, for maximum control over data, giving a whole lot more than just data visualization.

Using the right tool to visualize data can significantly increase your ability to detect abnormal behavior in your production, track business KPI’s, and accelerate your delivery lifecycle.