Learn more about Streama© – the foundational technology behind our stateful streaming data platform. Learn More

Intro to AIOps: Leveraging AI and Machine Learning in DevOps

  • Alex Mair
  • July 14, 2021
aiops

AIOps is a DevOps strategy that brings the power of machine learning to bear on observability and system management. It’s not surprising that an increasing number of companies are now adopting this approach.  

AIOps first came onto the scene in 2015 (coincidentally the same year as Coralogix) and has been gaining momentum for the past half-decade. In this post, we’ll talk about what AIOps is, and why a business might want to use it for their log analytics.

AIOps Explained

AIOps reaps the benefits of fantastic advances in AI and machine learning in recent decades.  Because enterprise applications are complex, yet predictable systems, AI and machine learning can be used with great effect to analyze their data and extract patterns. The AIOps Manifesto spells out five dimensions of AIOps

  1. Data set selection – machine learning algorithms can parse vast quantities of noisy data and provide Ops teams with a curated sample of clean data.  It’s then much easier to extract trustworthy insights and make effective business decisions.
  2. Pattern discovery – this generally occurs after a data set has been appropriately curated. It involves using a variety of ML techniques to extract patterns. This can be rule-based or neural networks that involve supervised and unsupervised learning.
  3. Inference – AIOps uses a range of inference algorithms to draw conclusions from patterns found in the data. These algorithms can make causal inferences about system processes ‘behind the data.’  Combining expert systems with pattern-matching neural networks creates highly effective inference engines.
  4. Communication – For AIOps to be of value it’s not enough for the AI to have the knowledge, it needs to be able to explain findings to a human engineer! AIOps has a variety of strategies for doing this including visualization and natural language summaries.
  5. Automation – AIOps achieves its power by automating problem-solving and operational decisions. Because modern IT systems are so complex and fast-changing, automated systems need to be intelligent. They need machine learning to respond to quickly changing conditions in an adaptive fashion.

Why IT needs AIOps

As IT has advanced, it has shouldered more and more of the essential processes of business organizations.  Not only has technology become more sophisticated, it has also woven itself into business practice in increasingly intricate ways.

The ‘IT department’ of the ‘90s, responsible for a few niche business applications, has virtually gone. 21st century IT lives in the cloud. Enterprise applications are virtual, consisting of thousands of ephemeral components.  Businesses are so dependent on them that many business processes are IT processes.

This means that DevOps has had to upgrade. Automation is essential to managing the fast-changing complexity of modern IT. AIOps is an idea whose time has come. 

How companies are using AIOps

Over the past decade, AIOps has been adopted by many organizations. In a recent survey, OpsRamps found that 68% of surveyed businesses were experimenting with AIOps due to its potential to eliminate manual labor and extract data insights.

William Hill, COTY, and KPN are three companies that have chosen the way of AIOps and their experience makes fascinating reading:

AIOps Case Study: William Hill

William Hill started using AIOps to combat game and bonus abuse. As a betting and gaming company, their revenues depended on people playing by the rules and with so many customers, a human couldn’t keep track of the data.

William Hill’s head of Capacity and Monitoring Engineering, Andrew Longmuir explains the benefits of adopting AIOps.  First, it helped with automation, and in particular what Andrew calls “silo-busting”. AI and machine learning allowed William Hill to integrate nonstandard data sources into their toolchain.

Andrew uses the analogy of a jigsaw. Unintegrated data sources are like missing pieces of a puzzle. Using machine learning allows William Hill to bring them back into the fold and create a complete picture of the system.

Second, AIOps enables William Hill’s team to solve problems faster.  Machine learning can be used to window data streams, reducing alert volumes, and eliminating operational noise.  It can also detect correlations between alerts, helping the team prevent problems before they arise.

Finally, incorporating AI and Machine Learning into William Hill’s IT strategy has even improved their customer experience. This results from them leveraging insights extracted from their analytics data to improve the design of their website.

Andrew has some words of wisdom for other organizations considering AIOps. He recommends focusing on a use case that is central to your company.  Teams need to be willing to trial multiple different solutions to find the optimum setup.

AIOps Case Study: COTY

COTY adopted AIOps to take the agility and scalability of their IT strategy to the next level. COTY is a major player in the cosmetics space, with clients that include Max Factor and Calvin Klein.  As a dynamic business, they relied on flawless and versatile performance from their IT infrastructure to manage everything from payrolls to wireless networks.

With over 4,000 servers and a cloud-based infrastructure, COTY’s IT system is far too complex for traditional DevOps strategies to handle. To deal with it they’ve chosen AIOps.

AIOps has improved the way COTY handles and analyzes data. Data sources are integrated into a ‘data lake’, and machine learning algorithms can crunch its contents to extract patterns.

This has allowed them to minimize noise, so their operations department isn’t bombarded with irrelevant and untrustworthy information. 

AIOps has transformed the way COTY’s DevOps team thinks about visibility. Instead of a traditional events-based model, they now use a global, service-orientated model.  This allows the team to analyze their business and IT holistically.

COTY’s Enterprise Management Architect, Dan Ellsweig, wants to take things further. Dan is using his AIOps toolchain to create a dashboard for executives to view. For example, the dashboard might show the CTO what issues are being dealt with at a particular point in time.

AIOps Case Study: KPN

KPN is a Dutch telecoms business with operating experience in many European countries.  They adopted AIOps because the amount of data they were required to process was more than a human could handle.

KPN’s Chief Product Owner Software Tooling, Arnold Hoogerwerf, explains the benefits of using AIOps. First, leveraging AI and machine learning can increase automation and reduce operational complexity. This means that KPN’s DevOps team can do more with the same number of people.

Secondly, AI and machine learning can speed up the process of investigating problems. With traditional strategies, it may take weeks or months to investigate a problem and find the root cause. The capacity of AI tools to correlate multiple data sources allows the team to make crucial links in days that otherwise would have taken weeks.

Finally, Hoogerwerf has a philosophical reason for using AIOps.  He believes that while data is important, it’s even more important to keep sight of what’s going on behind the data.

Data on its own is meaningless if you don’t have the knowledge and wisdom with which to interpret it.

Implementing AIOps with Coralogix

Although the three companies we’ve looked at are much larger than the average business, AIOps is not just for big companies. The increasing number of platforms and vendors supporting AIOps tooling means that any business can take advantage of what AIOps has to offer.

The Coralogix platform launched two years after the birth of AIOps and our philosophy has always paralleled the principles of AIOps.  As Coralogix’s CEO Ariel Assaraf explains, organizations are burdened with the need to analyze increasing quantities of data. They often can’t do this with existing infrastructure, resulting in more than 99% of data remaining completely untapped.

In this context, the Coralogix platform is a game-changer. It allows organizations to analyze data without relying on storage or indexing. This enables significant cost savings and greater data coverage. Adding machine learning capabilities on top of that makes Coralogix much more powerful than any alternative in the market. Instead of cherry-picking data to analyze, stateful stream analysis occurs in real-time.  

How Coralogix can help with pattern discovery

One of the five dimensions of AIOps is pattern discovery. Due to the ability of machine learning to analyze large quantities of data, the Coralogix platform is tailor-made for discovering patterns in logs. As a case in point, gaming company AGS uses Coralogix to analyze 100 million logs a day.

The patterns extracted have allowed their DevOps team to reduce MTTR by 70% and their development team to create enhanced user experiences that have tripled their user base.

Another case is the neural science and ML company Biocatch. With exponentially increasing log volumes, their plight was a vivid illustration of the complexity that 21st century DevOps teams increasingly face.

Coralogix could handle these logs by clustering entries into patterns and finding connections between them. This allowed Biocatch to handle bugs and solve problems much faster than before.

How Coralogix can communicate insights

Once patterns have been extracted, DevOps engineers receive automated insights and alerts about anomalies in the system behavior.  Coralogix achieves this by integrating with a variety of dashboards and visualization solutions such as Prometheus and CloudWatch.

Coralogix also implements a smarter alerting system that flags anomalies to DevOps engineers in real time.  Conventional alerting systems require DevOps engineers to set alerting thresholds manually. However, as we saw at the start of this article, modern IT is too complex and fast-changing for this approach to work.

Coralogix solves this with dynamic alerts. These use machine learning to adjust thresholds in response to data.  This enables a much more effective approach to anomaly detection, one that is tailored to the DevOps landscape of the 21st century.

Wrapping Up

The increasing complexity and volumes of data faced by modern DevOps teams mean that humans can no longer handle IT operations without help.  AIOps aims to leverage AI and machine learning with a view to converting high-volume data streams into insights that human engineers can act on.

AIOps fits with Coralogix’s own approach to DevOps, which is to use machine learning to help organizations effectively use the increasing volumes of data they generate.  Observability should be for the many, not just a few.

Stateful streaming analytics for observability data