[Live Webinar] Next-Level O11y: Why Every DevOps Team Needs a RUM Strategy Register today!

How Cognism consolidated their observability tools with Coralogix to improve monitoring efficiency, reduce downtime and lower costs.

  • 9 min read
case study
1800+
customers worldwide
65+
engineering users
Reduce list building time by 99%

The Challenge

Overview

The CTO organization at Cognism goes beyond defining the architecture and processes, and are in constant pursuit of new technologies that improve their stack. Their end goal is to switch to fully managed services so that their development team can spend their time building the product and do more interesting things than worrying about what’s going wrong. Today, the entire engineering team is using Coralogix which includes developers, analytics, devops and also the security team. 

Previous Set-Up

Cognism was previously using several different tools along with a home-grown solution. The microservices architecture had all the data in clusters, using Kafka, ElasticSearch and a graph on the cloud. This set up became complex and along the way they were also losing logs. There was always a lag between Kafka and the custom solution. This led them to be more reactive than proactive, and they would get an alert almost 15 minutes after the spike in traffic happened. 

The Coralogix team reached out to Cognism as they were a customer of their sales intelligence services and this led to an exploratory conversation between the 2 teams. 

“It was a perfect match because we now don’t have anything else except Coralogix for custom metrics and logging, and we started implementing more.”

Dejan Sijakovic, Chief Software Architect, Cognism

The Solution

Consolidation and Resource Optimization

Cognism’s main goal was to have everything in a single place and not to be too expensive. This was their expectation from the Coralogix platform. They particularly liked certain features of the solution like sending all the logs to the Amazon S3 bucket, which their analytics team can now easily access. With the data in a table they can do all kinds of reporting from logs whilst before they needed to go to Grafana. They previously needed engineers to help write the queries and then there was also the problem of missing logs. If their internal API was down, this was difficult to find out. They successfully replaced all of that with Coralogix. It was only after this migration that they realized the amount of data and custom metrics that they were missing out on, as they were able to compare the graphs with the earlier solution. 

“I must say it’s really affordable for all the things that you are offering. We have cut down the costs in this area and our SRE team doesn’t have to manage this.  Now we’re going to also turn off the custom metrics tool, because now all we need to do is just send the logs to Coralogix, and that’s it. So we will cut down expenses even more.”

Dejan Sijakovic, Chief Software Architect, Cognism

When they called their internal API, there was a high failure risk with every deployment as it could get shut down and as a result they could lose  logs which impacted customer metrics as some events were not tracked. By their estimate, this could be almost half the data. After moving to Coralogix, they don’t need this API and Kafka. What caused further pain was the 10-15 minute delay in getting alerts, or in some cases, not getting alerts at all. If there were no logs available to analyze, there was no way they could even know about the issue, or propose a solution. But with Coralogix, all these issues were resolved. 

Results and Benefits

Benefits to Cognism customers

Cognism is a SaaS tool that runs perpetually on the systems of its end users, so product uptime is critical. They had a commitment to help their end users reduce search time by 99.5%. The engineering team contributed to quality of service by transforming to a highly proactive approach. This was done by improving their visibility into what is happening in the systems that will allow them to react in under 5 minutes as opposed to 30 min after receiving a complaint. The trigger for this was when they discovered missing logs from data clusters in their previous observability solution. The engineering team needed to maintain all this, add more nodes and also decide what should be maintained and what can be deleted. This was cumbersome. 

“Data management is critical for the SRE team. We now don’t have to think much about it as it is fully managed by Coralogix. This engineering-led decision has a positive impact on the business because we saved on costs.”

Dejan Sijakovic, Chief Software Architect, Cognism

Transition to Coralogix

The migration plan was set up to stop sending logs to the current provider and start sending everything to Coralogix. They were easily able to learn how to start using the solution and build workarounds where needed. Wherever they needed more information, Coralogix’s support team was available to help with solutions. 

“The move was straightforward because Coralogix has excellent support and documentation. It was pretty much plug-and-play”

Marko Dzepina, Head of Site Reliability Engineering, Cognism

They were using a bunch of tools to collect and ship logs and actually started utilizing them to the full extent. This put less load on the application and still got the job done. 

The thing worth mentioning is that Cognism started collecting metrics from Kafka servers, ElasticSearch, Rabbit and the queue service they used. Cognism benefited from the integrations that Coralogix has out-of-the-box which helped them easily hook up and correlate across data. They felt that was a game changer.

Cognism started with a POC which lasted 14 days, and were able to connect all data sources within that time frame. After that it took just ~2 months to get the whole engineering team ramped up and actively using it. Developers would raise queries like “Why can’t I see last year’s events?” and they were quickly guided to where it was in the Amazon S3 bucket, and they just needed to query it. They didn’t need to go to the previous solution provider and export JSON Logs to Amazon S3. They now look forward to utilizing Coralogix even move across metrics, tracing and security. 

“Hardest thing is to force developers to change and start using something new. But Coralogix was so easy to adopt and now they love it.”

Dejan Sijakovic, Chief Software Architect, Cognism

Ease of Use

Cognism’s engineers felt the Coralogix platform was really easy to use. They found the visualizations very useful, and while other platforms had many colorful charts, there were times when that didn’t convey important information. They said that Coraligix had 3 critical charts that gave them all the insight they needed. Today, their teams start the day with a view of the dashboard. If any data returns an alert, they are able to catch it and resolve immediately on the floor. To test this, they sent the same logs to both Coralogix and the other provider simultaneously and Coralogix was able to return the reports instantly while the other took time. They even discovered events that they didn’t know were happening.

“You have everything in a single place. That’s a beautiful thing.”

Dejan Sijakovic, Chief Software Architect, Cognism

Today they collect data from Scala applications from where custom logs are generated, and Python projects from where they use Fluent bits to ship the logs out. They also collect metrics from their cloud service providers, Kafka, Rabbit and ElasticSearch. 

Coralogix Support Advantage

Cognism feels that Coalogix’s support is a true value add to this engagement and considers this as a feature. While they heard about this with the Coralogix sales team, they didn’t fully believe it till they experienced it first hand. 

“When I heard they respond in seconds, I first thought ‘but do they fix your problem?’. I contacted them a few times and whether it was to do with migration, or parsing they were able to solve queries quickly via chat or phone”

Marko Dzepina, Head of Site Reliability Engineering, Cognism

This is an important part of the migration process because the teams need to always learn a new tool by themselves, but having prompt response from the Coralogix Support gave them a lot of reassurance. To be stuck and be able to get the relevant help was a real differentiator.

Powerful features

With real-time logs in the dashboard, they liked the instant view it gave of top applications, so they didn’t need to navigate around the tool to see what was happening. They could see metrics visualized in Coralogix’s hosted Grafana and analyze it all in one place. Cognism also set up custom dashboards to manage their specific data needs. 

Cognism further integrated their CI/CD data, which gave them real-time visibility into their release data.  They were able to get proactive insights into the impact of new versions and releases which helps to correlate events to new releases and reduce issue resolution time, decrease maintenance costs, and improve end-customer satisfaction. The Github integration which Coralogix provided out-of-the-box allowed them to identify Application Performance Monitoring (APM) related issues down to the version level at no incremental cost. For example, if they observed a 500% increase in high severity logs, they were able to analyze the spike and identify the cause in 5 minutes using the Github version tags, without the need to search things in every microservice they have. 

By archiving all the data in an open source format, it can be accessed by several engineering teams including the BI team that manages the business analytics and even have the data display in their Tableau dashboard. 

Ongoing expansion

Based on their experience, Cognism has now adopted Coralogix’s Security offering after doing a one-month POC, and also has plans to move data from AWS CloudTrail and is further considering frontend logs. They will continue to explore additional custom metrics as some data needs to be kept for a year for compliance, and having a consistent Log ID will help in Tracing. 

Summary

In addition to the cost savings from consolidation, Cognism also saw business value in freeing up engineering resources so that they can operate proactively and reduce down time. This helps them get buy-in across the organization for constant improvement. They estimate the savings across both platform and people costs compared to the earlier solution is almost 50%. The earlier costs were skyrocketing due to ingestion of high log volumes which is now optimized with a focus on priority logs. They had 2 devops and 1 engineering resource on this project who are tasked to build and manage the custom metrics, APIs (Ex. Scala, Confluent cloud) who are now freed up and even able to focus on new things like bug detection. With their plan to shut down Kafka, it will reduce costs further. 

“Coralogix is the perfect match for our needs”

Dejan Sijakovic, Chief Software Architect, Cognism
Live Webinar
Next-Level O11y: Why Every DevOps Team Needs a RUM Strategy
April 30th at 12pm ET | 6pm CET
Save my Seat