Request Demo

Has your ELK Stack become too unwieldy to manage?

ELK Stack too unwieldy to manage

The ELK stack has become a staple of log analytics in recent years, but so too have the stories of complex maintenance and poor scalability. We’re going to discuss some of the problems with a self-hosted ELK stack and the advantages of a SaaS offering like Coralogix.

The Challenge of the ELK Stack

Whilst ELK performs the objective of collecting logs and visualizing data really well, it does present a complex portfolio of challenges from a watering and feeding perspective. The platform requires a very well thought out, powerful infrastructure backend when you start to scale and ingest lots of data. To demonstrate just what we mean let’s look at the required infrastructure for ingesting 500MB a day from a total of 15 servers.

The minimum number of nodes required for a production environment is 3. You will want to ensure they have SSD storage, a minimum of 16GB of ram (the sweet spot is actually 64gb according to elastic’s website), and a powerful CPU with multiple cores (a minimum of 4 for each node, 8 is the sweet spot according to elastic’s website).

To gauge prices easily, let’s look at what this infrastructure would cost in AWS. In this example, we spec’d the servers with 500GB SSDs to allow for 60 days’ worth of log retention and allow for future growth. The total is $360 a month, and that’s not taking into account any maintenance or backups! An important part to note here is that as you start to ingest more logs you will likely require more nodes and a load balancer. This example is a small environment, but it will provide a good benchmark for the minimum required infrastructure to run an ELK stack.

A Cost Analysis

Deploying a production-ready stack will take an experienced engineer about 5 working days, very optimistically. Then you need to think about maintenance. We would average this out at 8 to 10 days a month (another very generous estimate). With a calculation of about $550 a day for an ELK engineer, you are looking at $2750 to set up the stack and about $2200 a month to maintain it. In total your first year is going to cost the following:

Item

Upfront Cost Monthly Cost

Yearly Total

AWS Hosting Costs

NA

$360

$4320

Engineering Costs

$2750

$2200

$29150

Monthly Total

$2560

Yearly Total

$33470

Effectively a small logging and analytics platform is going to cost more than a junior developer for a year! The way ELK is designed you need to be operating at a large scale for it to make sense. This is of course the economy of scale. But with that comes even more complexity and cost.

Running out of Space

Outside of deploying the solution organizations face other challenges. One of the issues organizations run into is backend infrastructure problems. A common issue we see is environments ingesting too much data and running out of space. It’s not so much the technical challenge of purging older logs, or indeed adding additional storage. The real issue is the critical data that is lost whilst the backend is unavailable. Sometimes it can even cause a knock-on effect. For businesses that have enabled a buffer or cache on servers sending logs and have not configured it correctly, you can end up with the lack of connectivity resulting in logs being stored locally. The result is on virtual machines or containers with small drives they can run out of space too! The impact of taking down production systems in an organization can have a serious impact and often result in a loss of profit.

Patching your ELK Stack

Another area that can provide significant pain to an organization is keeping the backend up to date. Firstly, the technical challenges that need to be overcome whilst upgrading are complex and the process can be very time-consuming. You need to update each component individually and it’s important to remember rolling upgrades in Elastic only support minor versions. As a result, failure to keep your system updated could make upgrading the future a serious piece of work!

We often see businesses running seriously out of date versions with the risks of upgrading preventing the business from applying critical patches & updates. A common knock on with upgrading the stack is that plugins for Kibana often no longer function correctly or that you have to completely rewrite visualizations. This whilst annoying is not as serious as running unpatched software. We are not going to go into the risks of running out of date software, but this can have massive consequences for businesses when it inevitably goes wrong.

Configuring your ELK Stack Correctly

If we put the costs aside and start to dig into the analytics, we find that the story is much the same. Outside of the horror stories of organizations not correctly securing or deploying their ELK stack comes the building business value side. The core driving force behind deploying such a system is to create powerful data & visualizations that can be used to improve infrastructure and applications. In order to do this, you need to customize your ELK stack to suit the needs of your organization and this is where the common issues arise with the complexity of managing the platform. For each component, you wish to monitor you need to configure the elastic infrastructure to support the source. You can lose countless hours trying to diagnose why a system is not correctly providing data to elastic. This is an area that is addressed in Coralogix with its array of out of the box plugins and integrations.

The most important part though when it comes to running your own ELK stack is security. Your logs are likely to contain sensitive information about you, your customers, your business, or all of the above! Authentication and authorization are a must for any business running ELK. The challenge is that the ELK stack doesn’t really provide an easy way to implement the required security practices discussed.

A common implementation is deploying the ELK basic security features; however, they are very limited. If you are going to run ELK securely then you need to deploy your own security to the platform. This is extremely costly and often overlooked. The damage of having an ELK stack compromised to an organization is often devastating. As a result, we rank this has the most unwieldy part of managing an ELK stack internally.

Other Common Problems

Here are the common pitfalls organizations find after running Elastic for 1 to 3 years:

  • The amount of maintenance required was significantly higher than originally expected.
  • Upgrades are really time-consuming and often cause knock-on issues.
  • Logstash regularly eats all the available memory on our servers and often requires the service to be restarted.
  • Organizations spend an excessive amount of time performance tuning the platform.
  • Organizations regularly run out of storage at the beginning and as a result, lose critical data.
  • Security features are extremely complex to implement and require custom solutions.
  • Organizations networks experience a large volume of data transferred between ELK nodes resulting in knock-on performance impacts to other services on the network.
  • When organizations need to scale the complexity of the infrastructure more than doubled.
  • The infrastructure becomes a full-time job!

A Cost Comparison with your Coralogix ELK Stack

Using our example above of $2560 a month for an internally hosted ELK stack with 60 days retention and around just 100gb of logs a month you could consume Coralogix for just $320! That’s a saving of $2240 a month which is $26,880 a year!

Item

Upfront Cost

Yearly Cost

AWS & Internal Managed

$2750

($2560 x 12 = $30,720)

Coralogix

NA

($320 x 12 = $3840)

Yearly Saving with Coralogix 

$2750

$26,880

Three Year Saving with Coralogix 

$2750

$80,640

Not only is the solution more cost-effective but you also reduce the risk to the business and finally your engineers can have a good night’s sleep! As we have discussed, running the system internally and experiencing issues can cause damage to a business when things go wrong. Consuming Coralogix as a service means that the backend infrastructure is monitored and maintained by ELK experts allowing you to concentrate on what you do best! Coralogix offers a number of value-added features that simplify deployment and provide additional capabilities for example machine learning.

Ultimately if your ELK stack has become too unwieldy to manage, or you are looking to reduce your operational costs and increase your capabilities then Coralogix is here to help and remove the pitfalls of running your own platform!

Start solving your production issues faster

Let's talk about how Coralogix can help you

Managed, scaled, and compliant monitoring, built for CI/CD

Get a demo

No credit card required

Get a personalized demo

Jump on a call with one of our experts and get a live personalized demonstration