Intro to AIOps: Leveraging AI and Machine Learning in DevOps

AIOps is a DevOps strategy that brings the power of machine learning to bear on observability and system management. It’s not surprising that an increasing number of companies are now adopting this approach.  

AIOps first came onto the scene in 2015 (coincidentally the same year as Coralogix) and has been gaining momentum for the past half-decade. In this post, we’ll talk about what AIOps is, and why a business might want to use it for their log analytics.

AIOps Explained

AIOps reaps the benefits of fantastic advances in AI and machine learning in recent decades.  Because enterprise applications are complex, yet predictable systems, AI and machine learning can be used with great effect to analyze their data and extract patterns. The AIOps Manifesto spells out five dimensions of AIOps

  1. Data set selection – machine learning algorithms can parse vast quantities of noisy data and provide Ops teams with a curated sample of clean data.  It’s then much easier to extract trustworthy insights and make effective business decisions.
  2. Pattern discovery – this generally occurs after a data set has been appropriately curated. It involves using a variety of ML techniques to extract patterns. This can be rule-based or neural networks that involve supervised and unsupervised learning.
  3. Inference – AIOps uses a range of inference algorithms to draw conclusions from patterns found in the data. These algorithms can make causal inferences about system processes ‘behind the data.’  Combining expert systems with pattern-matching neural networks creates highly effective inference engines.
  4. Communication – For AIOps to be of value it’s not enough for the AI to have the knowledge, it needs to be able to explain findings to a human engineer! AIOps has a variety of strategies for doing this including visualization and natural language summaries.
  5. Automation – AIOps achieves its power by automating problem-solving and operational decisions. Because modern IT systems are so complex and fast-changing, automated systems need to be intelligent. They need machine learning to respond to quickly changing conditions in an adaptive fashion.

Why IT needs AIOps

As IT has advanced, it has shouldered more and more of the essential processes of business organizations.  Not only has technology become more sophisticated, it has also woven itself into business practice in increasingly intricate ways.

The ‘IT department’ of the ‘90s, responsible for a few niche business applications, has virtually gone. 21st century IT lives in the cloud. Enterprise applications are virtual, consisting of thousands of ephemeral components.  Businesses are so dependent on them that many business processes are IT processes.

This means that DevOps has had to upgrade. Automation is essential to managing the fast-changing complexity of modern IT. AIOps is an idea whose time has come. 

How companies are using AIOps

Over the past decade, AIOps has been adopted by many organizations. In a recent survey, OpsRamps found that 68% of surveyed businesses were experimenting with AIOps due to its potential to eliminate manual labor and extract data insights.

William Hill, COTY, and KPN are three companies that have chosen the way of AIOps and their experience makes fascinating reading:

AIOps Case Study: William Hill

William Hill started using AIOps to combat game and bonus abuse. As a betting and gaming company, their revenues depended on people playing by the rules and with so many customers, a human couldn’t keep track of the data.

William Hill’s head of Capacity and Monitoring Engineering, Andrew Longmuir explains the benefits of adopting AIOps.  First, it helped with automation, and in particular what Andrew calls “silo-busting”. AI and machine learning allowed William Hill to integrate nonstandard data sources into their toolchain.

Andrew uses the analogy of a jigsaw. Unintegrated data sources are like missing pieces of a puzzle. Using machine learning allows William Hill to bring them back into the fold and create a complete picture of the system.

Second, AIOps enables William Hill’s team to solve problems faster.  Machine learning can be used to window data streams, reducing alert volumes, and eliminating operational noise.  It can also detect correlations between alerts, helping the team prevent problems before they arise.

Finally, incorporating AI and Machine Learning into William Hill’s IT strategy has even improved their customer experience. This results from them leveraging insights extracted from their analytics data to improve the design of their website.

Andrew has some words of wisdom for other organizations considering AIOps. He recommends focusing on a use case that is central to your company.  Teams need to be willing to trial multiple different solutions to find the optimum setup.

AIOps Case Study: COTY

COTY adopted AIOps to take the agility and scalability of their IT strategy to the next level. COTY is a major player in the cosmetics space, with clients that include Max Factor and Calvin Klein.  As a dynamic business, they relied on flawless and versatile performance from their IT infrastructure to manage everything from payrolls to wireless networks.

With over 4,000 servers and a cloud-based infrastructure, COTY’s IT system is far too complex for traditional DevOps strategies to handle. To deal with it they’ve chosen AIOps.

AIOps has improved the way COTY handles and analyzes data. Data sources are integrated into a ‘data lake’, and machine learning algorithms can crunch its contents to extract patterns.

This has allowed them to minimize noise, so their operations department isn’t bombarded with irrelevant and untrustworthy information. 

AIOps has transformed the way COTY’s DevOps team thinks about visibility. Instead of a traditional events-based model, they now use a global, service-orientated model.  This allows the team to analyze their business and IT holistically.

COTY’s Enterprise Management Architect, Dan Ellsweig, wants to take things further. Dan is using his AIOps toolchain to create a dashboard for executives to view. For example, the dashboard might show the CTO what issues are being dealt with at a particular point in time.

AIOps Case Study: KPN

KPN is a Dutch telecoms business with operating experience in many European countries.  They adopted AIOps because the amount of data they were required to process was more than a human could handle.

KPN’s Chief Product Owner Software Tooling, Arnold Hoogerwerf, explains the benefits of using AIOps. First, leveraging AI and machine learning can increase automation and reduce operational complexity. This means that KPN’s DevOps team can do more with the same number of people.

Secondly, AI and machine learning can speed up the process of investigating problems. With traditional strategies, it may take weeks or months to investigate a problem and find the root cause. The capacity of AI tools to correlate multiple data sources allows the team to make crucial links in days that otherwise would have taken weeks.

Finally, Hoogerwerf has a philosophical reason for using AIOps.  He believes that while data is important, it’s even more important to keep sight of what’s going on behind the data.

Data on its own is meaningless if you don’t have the knowledge and wisdom with which to interpret it.

Implementing AIOps with Coralogix

Although the three companies we’ve looked at are much larger than the average business, AIOps is not just for big companies. The increasing number of platforms and vendors supporting AIOps tooling means that any business can take advantage of what AIOps has to offer.

The Coralogix platform launched two years after the birth of AIOps and our philosophy has always paralleled the principles of AIOps.  As Coralogix’s CEO Ariel Assaraf explains, organizations are burdened with the need to analyze increasing quantities of data. They often can’t do this with existing infrastructure, resulting in more than 99% of data remaining completely untapped.

In this context, the Coralogix platform is a game-changer. It allows organizations to analyze data without relying on storage or indexing. This enables significant cost savings and greater data coverage. Adding machine learning capabilities on top of that makes Coralogix much more powerful than any alternative in the market. Instead of cherry-picking data to analyze, stateful stream analysis occurs in real-time.  

How Coralogix can help with pattern discovery

One of the five dimensions of AIOps is pattern discovery. Due to the ability of machine learning to analyze large quantities of data, the Coralogix platform is tailor-made for discovering patterns in logs. As a case in point, gaming company AGS uses Coralogix to analyze 100 million logs a day.

The patterns extracted have allowed their DevOps team to reduce MTTR by 70% and their development team to create enhanced user experiences that have tripled their user base.

Another case is the neural science and ML company Biocatch. With exponentially increasing log volumes, their plight was a vivid illustration of the complexity that 21st century DevOps teams increasingly face.

Coralogix could handle these logs by clustering entries into patterns and finding connections between them. This allowed Biocatch to handle bugs and solve problems much faster than before.

How Coralogix can communicate insights

Once patterns have been extracted, DevOps engineers receive automated insights and alerts about anomalies in the system behavior.  Coralogix achieves this by integrating with a variety of dashboards and visualization solutions such as Prometheus and CloudWatch.

Coralogix also implements a smarter alerting system that flags anomalies to DevOps engineers in real time.  Conventional alerting systems require DevOps engineers to set alerting thresholds manually. However, as we saw at the start of this article, modern IT is too complex and fast-changing for this approach to work.

Coralogix solves this with dynamic alerts. These use machine learning to adjust thresholds in response to data.  This enables a much more effective approach to anomaly detection, one that is tailored to the DevOps landscape of the 21st century.

Wrapping Up

The increasing complexity and volumes of data faced by modern DevOps teams mean that humans can no longer handle IT operations without help.  AIOps aims to leverage AI and machine learning with a view to converting high-volume data streams into insights that human engineers can act on.

AIOps fits with Coralogix’s own approach to DevOps, which is to use machine learning to help organizations effectively use the increasing volumes of data they generate.  Observability should be for the many, not just a few.

The Secret Ingredient That Converts Metrics Into Insights

Metrics and Insight have been the obsession of every sector for decades now. Using data to drive growth has been a staple of boardroom meetings the world over. The promise of a data-driven approach has captured our imaginations.

What’s also a subject of these meetings, however, is why investment in data analysis hasn’t yielded results. Directors give the go ahead to sink thousands of dollars into observability and analytics solutions, with no returns. Yet all they see on the news and their LinkedIn feeds is competitors making millions, maybe even billions, by placing Analytics and Insight at the top of their agenda.

These directors and business leaders are confused. They have teams of data scientists and business analysts working with the most cutting edge tools on the market. Have those end of year figures moved though? Has performance improved more than a hard fought one, maybe two percent?

All metrics, no insights

The problem lies in those two words- Metrics and Insight.

More specifically, the problem is that most businesses love to dive into the metrics part. What they don’t realize is that without the ‘Insight’ half of the equation, all data analysis provides is endless logs of numbers and graphs. In a word, noise.

Sound familiar? Pumping your time, energy, and finance into the metrics part of your process will yield diminishing returns very quickly if that’s all you’re doing. If you want to dig yourself out of the ‘we just need more data’ trap, maybe you should switch your focus to the insights?

Data alone won’t solve this problem. To gain insight, what you need is something new. Context.

Why you NEED context

Metrics and Insight is business slang for keeping an extra close eye on things. Whether you’re using that log stack for financial data or monitoring a system or network, the fundamentals are the same. The How can be incredibly complex, but the What is straight forward. You’re using tech as a microscope for a hyper-focused view.

Without proper context it is impossible to gain any insight. Be it a warning about continued RAM spikes from your system monitoring log, or your e-commerce dashboard flagging a drop in sales of your flagship product, nothing actionable can be salvaged from the data alone.

If your metrics tell you that your CPU is spiking, you remain entirely unaware of why this is happening or what is going on. When you combine that spike in CPU with application logs indicating thread locking due to database timeouts, you suddenly have context. CPU spikes are good for getting you out of bed, but your logs are where you will find why you’re out of bed.

But how do you get context?

Context – Creating insight from metrics

With platforms like Coralogix, the endless sea of data noise can be condensed and transformed into understandable results, recommendations, and solutions. With a single platform, the results of your Analysis & Insight investments can yield nothing but actionable observations. Through collecting logs based on predetermined criteria, and delivering messages and alerts for the changes that impact your goals, Coralogix makes every minute you spend with your data cost effective. The platform provides context.

Platforms like Coralogix create context from the noise, filtering out what’s relevant to provide you with a clear picture of your data landscape. From context comes clarity, and from clarity comes insight, strategy, and growth.