Dev Hours Saved/Month
Tocaro is a team collaboration platform provided by ITOCHU Techno-Solutions (“CTC”). The platform provides customers with real-time messaging, video conference, voice chat, task management, file sharing, and customizable bank-level security.
ITOCHU Techno-Solutions (CTC) is one of the largest systems integrators in Japan, providing a wide range of business and IT services to organizations worldwide. ITOCHU Techno-Solutions America, the Silicon Valley arm of CTC specializes in cloud-native and open source technologies, recently partnered with Coralogix to bring the best-in-class observability platform to their clients.
In the early stages of the relationship, the Silicon Valley team introduced Coralogix to their colleagues managing Tocaro and worked with them to establish Coralogix as an essential piece in their observability stack.
Tocaro is a collaboration tool aimed at increasing productivity in organizations of all sizes and in all industries. The team responsible for developing and managing the Tocaro system uses Coralogix to collect and centralize trace logs from each API and manage web errors such as 5XX HTTP status codes.
They are now leveraging Coralogix’s Logs2Metrics feature to aggregate their logs into trackable metrics and relying on dynamic alerting to get real-time alerts when a critical error occurs. With real-time identification and immediate access to the log data, the team can proactively investigate and remediate errors that otherwise may have taken hours or days to resolve.
During the implementation process, the team also began to explore more use cases for how their log data could be leveraged to provide business insights for additional teams within the organization such as their customer success team.
Previously, the team managing Tocaro was using Cloudwatch to aggregate logs and detect errors, but they struggled with the lack of real-time issue identification. There is a 10+ minute delay between an event occurring and the logs being available in Cloudwatch, meaning that issues were not identified immediately.
With the alerting delay from Cloudwatch, the team could not be sure if a release was successful or not. After the release, they would occasionally start to see a few errors, but it would not be until 10+ minutes later that they would begin to see thousands of logs coming in.
The delay in log visibility led the team to develop a release process that would allow them to maintain the stability of their systems but essentially doubled the work required.
During the release, the team would have a tool like DataDog open in another window for step-by-step monitoring. This gave the team an immediate indication if the release caused any performance issues.
Still, the team did not have the information needed to investigate and understand the issue fully. Depending on the magnitude of the symptom, the next steps would be to either stop and wait for the logs or begin to roll back the release.
At this point, with a small and focused team, it was possible to do an immediate investigation but the lack of centralized logging made it almost impossible to determine if the issue was local or affecting customers on a wider scale. Plus, with limited resources, automation and cloud innovation are top priorities.
With the high cost of managing their logs and the team struggling to realize the full potential of the data, it was clear that the ROI of their log management solution could be dramatically improved.
The implementation itself was very straightforward, all the team had to do was migrate the existing Fluent Bit daemonset. During the initial setup, the team managing Tocaro was working closely with their colleagues at CTC as well as with Coralogix’s customer support to leverage many different features for streamlined integration into the team’s workflow.
With Coralogix, there’s no need to have any dashboard open to monitor the release process step-by-step. The team knows that if something goes wrong, they will be alerted immediately and they will have all of the information they need to understand what happened.
Coralogix’s error detection, proactive alerting, and visualization of the status of each API in Kibana help the team ensure the stability and health of their application. Once an issue has been identified, the team opens their Kibana dashboard with metrics generated from their API logs to begin the investigation.
In cases where additional information is needed, the team uses the Explore Screen in the Coralogix platform to easily filter their data for specific applications, subsystems, error types, and more.
The direct impact [of Coralogix] is saving around 20 hours of engineering resources per month, but I believe the benefit is greater than that because we can now understand what we did not understand before.
The team initially added Coralogix to their workflow to improve their ability to identify and investigate critical events in real-time. Once they began to use and optimize their usage, they saw the potential for added value with additional use cases.
The team’s use of Coralogix’s TCO Optimization feature shows how the platform can be leveraged to turn raw log data into actionable insights across multiple teams and functions.
Data used to monitor and alert on errors are kept in the Frequent Search pipeline to ensure that the team has all the data readily available to most effectively and efficiently troubleshoot and resolve issues immediately. That accounts for approximately 20% of the log data.
Around 10% of the data is being leveraged in the monitoring pipeline. This data is being processed using the Logs2Metrics feature, which generates metrics from the log data for system trend analysis. For example, how many errors happen in a service at a particular time of day or over a few months.
Furthermore, metrics are generated to measure user activity. This has opened a new door to producing insights from log data. Using these metrics, the team has created a dashboard for the Customer Success teams to understand, per account, how many active users there are, what actions they are doing in the platform, and more.
The team can be updated using dynamic alerting if an account’s active daily users have decreased or increased compared to usual. This can indicate potential churn or potential for upselling, respectively.
The majority of the data is allocated to the Compliance pipeline, which allows the team to archive the data while still having it available if it’s needed later on. Overall, the organization has dramatically increased its return on investment in developer productivity, data storage costs, and decreased time to identify and resolve.
The most immediate, and obvious, results of moving the team’s log monitoring to Coralogix were the real-time alerts and the ability to check the related logs as soon as an event occurred. Beyond removing the 10+ minute delay, the team was able to use Logs2Metrics to aggregate their API data into trackable metrics that can be monitored and alerted on.
With the full integration into their workflow, the team has reduced the time required to detect and resolve issues by more than 50%. Since the same group of engineers is responsible for both error diagnosis and development, this achievement has allowed them to dedicate more time on the development of new and innovative features.
In addition to the original use case of improving error detection and troubleshooting workflows, the team has improved visibility into customer behavior and expanded the organization’s understanding of application health and business health.
A lot of resellers might be selling but not actually using the product. We wanted to make sure that these technologies were consumed, and we wanted to really verify the real value of the platform before bringing it to our customers.
With Coralogix, the team is confident that the logs are analyzed in real-time, the alerting is immediate, and they can view all of the data in their dashboards. This way, they can instantly know if a symptom is for a specific service that was just released or if it’s impacting everyone.
Beyond replacing the previous logging solution, Coralogix enables the team to extract net-new value from their log data. The advanced features, flexibility, and cost optimization enabled the team to build entirely new use cases that they hadn’t previously explored.