Back
Back

Apache Log4j Vulnerability – How We Addressed it For Our Customers

The log4j vulnerability gives hackers the ability to type a specific string into a message box and execute a malicious attack remotely, this can include installing malware, stealing user data, and more. It was originally discovered in Minecraft on December 9, which was officially announced to the world as a zero-day critical-severity exploit in the log4j2 logging library, CVE-2021-44228, also known as “Log4Shell.”

As soon as we heard of the vulnerability, we pulled our team in and began working on a solution to ensure our systems and customers would not be at risk. So, how did we do it?

A Plan of Attack Defense

According to the initial reports on how to mitigate the vulnerability, log4shell is mitigated in one of three ways:

  1. The vulnerability is partially mitigated by ensuring that your JVM-based services are running an up-to-date version of the JVM.
  2. Where possible, the log4j2 library itself should be updated to 2.15
  3. If the JVM-based service is running a version of log4j2 that is at least 2.10, then the JVM can be run with the flag -Dlog4j2.formatMsgNoLookups=true. Otherwise, the log4j2 properties file can be rewritten such that, in each case where a logging pattern of %m is configured, it can be replaced by %m{nolookups}.

Of course, this results in three equally tricky questions:

  1. How can we quickly upgrade our JVM services to use a patched JVM?
  2. Where are all of our usages of `log4j2` across our system?
  3. Where do we need to rely on changing JVM flags or log4j2.properties files, and how can we efficiently make those changes?

Where were we vulnerable?

JVM

At Coralogix, our infrastructure is based on Kubernetes, and all of our services run on our Kubernetes clusters. We follow containerization best practices and ensure our services are built on top of a series of “base images,” some including the JVM.


To patch our JVMs across production, we opened a pull request that brought each of our JVM-based base images up to date, pushed them, and then triggered new builds and deploys for the rest of our services.

log4j

First, let’s distinguish between services that we built in-house and services provided by our vendors.

At Coralogix, we practice the engineering value that everything we run in production must be written in code somewhere and version-controlled. In the event of a vulnerability like log4shell, this makes it simple to come up with all the places that used log4j, by writing a simple script that:

  1. Iterates over each of our code repositories and clones them locally
  2. Searches the code base for the string `org.apache.logging.log4j`
  3. If the repository has that string, then add it to a list of repositories that need to be updated.

By notifying all of our R&D to repositories list that requires updating, we quickly mobilized all of our developers to take immediate action to patch the library in the repositories for which they are responsible. This helped us promptly update hundreds of JVM-based microservices to log4j2-2.15.0 in a matter of hours.

As for the services that our vendors provide, our platform team has a shortlist of such services. As such we were able to quickly review the list, see which services were vulnerable (i.e., services that are not JVM-based are not vulnerable), and manually put the fix in for each one and push to production.

What did we learn from log4shell?


a) We own our stack from top to bottom. 

The MIT open-source license clearly states, in all capital letters, THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. Nowhere does this become painfully apparent until a major vulnerability hits, your vendors are incommunicado, and you’re a sitting duck. When a major security vulnerability is published, we need to be able to quickly take responsibility for each element of our stack, locally fork the code if necessary, and get a fix out the door quickly.

b) There are significant benefits to having a small, tight-knit engineering team. 

As we verified that we were fully secure, many larger industry players were already visibly suffering publicly-visible, embarrassing attacks. By quickly reaching every developer in R&D with the exact knowledge of what they needed to do and why it was important to be done quickly, we were able to leverage more manpower than most organizations that only making it the responsibility of security teams to get their systems secure.

To learn more about how we pushed through this challenge, you can check our status page

On this page