How do Observability and Security Work Together?

There’s no question that the last 18 months have seen a pronounced increase in the sophistication of cyber threats. The technology industry is seeing a macro effect of global events propelling ransomware and wiperware development further into the future, rendering enterprise security systems useless. This is where enterprise network monitoring comes in.

Here at Coralogix, we’re passionate about data observability and security and what the former can do for the latter. We’ve previously outlined key cyber threat trends such as trojans/supply chain threats, ransomware, the hybrid cloud attack vector, insider threats, and more. 

This article will revisit some of those threats and highlight new ones while showing why observability and security should be considered codependent. 

Firewall Observability

Firewalls are a critical part of any network’s security. They can give some of the most helpful information regarding your system’s security. A firewall is different from an intrusion detection system (which we discuss below) – you can think of a firewall as your front door and the intrusion detection system as the internal motion sensors. 

Firewalls are typically configured based on a series of user-proscribed or pre-configured rules to block unauthorized network traffic.

Layer 3 vs. Layer 7 Firewalls

Two types of firewalls are common in the market today: Layer 3 and Layer 7. Layer 3 firewalls typically block specific IP addresses, either from a vendor-supplied list that is automatically updated for the user or a custom-made allow/deny list. A mixture of the two is also typical, allowing customers to benefit from global intelligence on malicious IP addresses while being able to block specific addresses that have previously attempted DDoS attacks, for example. 

Layer 7 firewalls are more advanced. They can analyze data entering and leaving your network at a packet level and filter the contents of those packets. Initially, this capability filters malware signatures, preventing malicious actors from disrupting or encrypting a system. Today, more organizations are using layer 7 firewalls to prevent data and ingress. This is particularly useful in protecting against data breaches, insider threats, and ransomware when data may be leaving your network. 

Given that it’s best practice to have a layer 3 and a layer 7 firewall, and the amount of data generated by the latter, having an observability platform like Coralogix to collate and contextualize this data is critical.

Just a piece of the puzzle

Given that a firewall is just one tool in a security team’s arsenal, it’s essential to be able to correlate events at a firewall level with other system events, such as database failures, malware detection, or data egress. Fortunately, Coralogix ingests firewall logs and metrics using either Logstash or its own syslog agent, which means that it can work with a wide variety of firewalls. Additionally, Coralogix’s advanced log parsing and visualization technologies allow security teams to overlay firewall events with other security metrics simply. Coralogix also provides some bespoke integrations to a number of the most popular firewalls. 

Firewall data in isolation isn’t that helpful. It can tell you what malicious traffic you’ve successfully blocked, but not what you’ve missed. That’s why adding context from other security tools is vital.

Intrusion Detection Systems and Observability

As mentioned above, if firewalls are the first defense, then intrusion detection systems are next in line. Intrusion detection is key because it can tell you the nature of the threat that’s breached your system and highlight what your firewall might have missed. Remember, a firewall will only be able to tell you what didn’t get in or what was let in. 

Adding an intrusion detection system allows you to assess and neutralize threats that bypass other network security controls. Some intrusion detection systems pull data from OWASP to hunt for the most common malware and vulnerabilities, while others use crowdsourced data. 

By layering intrusion detection data, like that from Suricata, your SRE or security team will be able to detect attacks and identify the point of entry. Such context is vital in reengineering cyber defenses after an attack.

Kubernetes Observability and Security 

55% of Kubernetes deployments are slowed down due to security concerns, says a recent Red Hat survey. The same study says that 93% of respondents experienced some sort of security incident in a Kubernetes environment over the last year. 

Those two statistics tell you everything you need to know. Kubernetes security is important. Monitoring Kubernetes is vital to maintaining cluster security, as we will explore below.

Pod Configuration Security

By default, there is no configured network security rule which permits pods to communicate with each other. Pod security is heavily defined by role-based access control (RBAC). It’s possible to monitor the security permissions assigned to a given user to ensure there isn’t over-provisioning of access.

Malicious Code

A common attack vector to a Kubernetes cluster is via the containerized application itself. By monitoring the host level or IP requests, you can limit your vulnerability to DDOS attacks, which would otherwise take the cluster offline. Using Prometheus for operational, enterprise network monitoring tool is a good way of picking up vital metrics from containerized environments. 

Runtime Monitoring

A container’s runtime metrics will give you a good idea of whether it’s also running a secondary, malicious process. Runtime metrics to look out for include network connections, endpoints, and audit logs. By monitoring these metrics and using an ML-powered log analyzer, such as Loggregation, you can spot any anomalies which may indicate malicious activity.

Monitoring for protection

With Kubernetes, several off-the-shelf security products may aid a more secure deployment. However, as you can see above, there is no substitute for effective monitoring for Kubernetes security.

Network Traffic Observability and Security

It should be abundantly clear why an effective observability strategy for your network traffic is critical. On top of the fundamentals discussed so far, Coralogix has many bespoke integrations designed to assist your network security and observability. 

Zeek

Zeek is an open-source network monitoring tool designed to enhance security through open-source and community participation. You can ship Zeek logs to Coralogix via Filebeat so that every time Zeek performs a scan, results are pushed to a single dashboard overlaid with other network metrics.

Cloudflare

Organizations around the world use Cloudflare for DDOS and other network security protection. However, your network is only as secure as the tools you use to secure it. Using the Coralogix audit log integration for Cloudflare, you can ensure that access to Cloudflare is monitored and any changes are flagged in a network security dashboard. 

Security Traffic Analyzer

Coralogix has built a traffic analyzer specifically for monitoring the security of your AWS infrastructure. The Security Traffic Analyzer connects directly to your AWS environment and collates information from numerous AWS services, including network load balancers and VPC traffic.

Application-level Observability

Often overlooked, application-level security is more important than ever. With zero-day exploits like Log4j becoming more and more common, having a robust approach to security from the code level up is vital. You guessed it, though – observability can help. 

To the edge, and beyond

Edge computing and serverless infrastructure are just two examples of the growing complexities you must consider with application-level security. Running applications on the edge can generate vast amounts of data, requiring advanced observability solutions to identify anomalies. Equally, serverless applications can lead to security and IAM issues, which have been the causes of some of the world’s biggest data breaches. 

Observability for Hybrid Cloud Security

In the world of hybrid cloud, observability and security are closely intertwined. The complexities of running systems in a mixture of on-premise and cloud environments give malicious actors, and your own security teams, a lot to work with. 

Centralized Logging

It’s unlikely that the security tooling for your cloud environments will be the same as that used on-premise. Across different systems, vendors will likely have different security tools, all with varied log outputs. A single repository for these outputs, which will also parse them in a standardized fashion, is a key part of effective defense. Without this, your security teams may be spending unnecessary time decrypting the nuances in two different products’ logs, trying to find a connection. 

Dashboarding

A single pane of glass is the only way to implement observability in a complex environment. Dashboards help spot trends and identify outliers, making sure that two teams with different perspectives are “singing from the same hymn sheet. Combine effective dashboarding with holistic data collection, and you’re onto a winner.

Observability is Security 

At Coralogix, we firmly believe that the most important tool in your security arsenal is effective monitoring and observability. But it’s not just effectiveness that’s key, but also pragmatism. We firmly believe in the value of collecting holistic data, such as from Slack and PagerDuty, to tackle security incidents as well as to detect them. 

The bulk of this piece has been regarding how observability can help detect malicious actors and security incidents. Our breadth of out-of-the-box integrations and the openness of our platform give organizations free rein to build security-centered SIEM tools. However, by analyzing otherwise overlooked data, such as from internal communications, website information, and marketing data, in conjunction with traditionally monitored metrics, you can really supercharge your defense and response.

Summary

Hopefully, you can see that security and observability are no longer separate concepts. As companies exploit increasingly complex technologies, generate more data, and deploy more applications, observability and security become bywords for one another. However, for your observability strategy to become part of your security strategy, you need the right platform. That platform will collate logs for you automatically while highlighting anomalies, integrate with every security tool in your arsenal and contextualize their data into one dashboard, and bring your engineers together to combat the technical threats facing your organization. 

On-premise vs. On the Cloud

Since its emergence in the mid-2000s, the cloud computing market has evolved significantly. The benefits of reliability, scalability, and cost reduction using cloud computing have created a demand to fuel an ever-growing range of “as-a-service” offerings, resulting in an option to suit most requirements. But despite the advantages, the question of cloud or on-premise remains valid.

As an organization, you can choose whether it’s best to host and manage your computing infrastructure, data, and services in-house or engage a third party to supply, host, and maintain the hardware and – optionally – provide additional services on top.

While some enterprises have opted for a wholesale migration to the cloud, others have taken a piecemeal approach – maintaining their infrastructure for some systems and using cloud-hosted software and services where it makes sense for them. 

What is clear is that there is no one-size-fits-all approach – what’s right for your business will depend on a range of factors, which we’ll come to shortly. But first, let’s clarify what we mean by on-premise and on the cloud.

What is on-premise?

On-premise refers to computing infrastructure – servers and other hardware – physically located in your company’s offices (or another location to which you have access). You run operating systems and software that you have licensed or developed in-house.

Depending on your organization’s purpose, you may run several different systems on those machines – from end-user software and databases to email servers and firewalls – and make them available to those within your company via a private network.  Because the physical hardware and everything running on it is managed in-house, you have control over (and responsibility for) how it is secured, accessed, and maintained.

What is the cloud? Understanding computing-as-a-service

Cloud computing refers to computer hardware owned and managed by a third-party provider and the services running on that hardware. 

When you opt to use cloud computing, you have little visibility over where the hardware you’re using is located (although, for legal reasons, you will probably know the country or region) – your interaction is with the virtual machines, containers, functions, or software running on those resources.

Cloud computing breaks down into multiple layers of services, allowing you to choose the degree of control you want:

  • Infrastructure-as-a-service (IaaS) – at the lowest or most basic level, we have infrastructure hosted and maintained by a third party. As a customer, you have access to one or more virtual machines (VMs) and can decide how to provision them and what to run on them. At the same time, the cloud service provider supplies the physical hardware and takes care of networking, storage, and processing power. Examples include Amazon EC2, Google Cloud Engine, and Azure Virtual Machine instances.
  • Container-as-a-service (CaaS) – if you’re deploying containerized applications or services and do not need control of the virtual machines hosting those containers, then CaaS may be the ideal level of abstraction. As with IaaS, the cloud service provider supplies, provisions, and maintains the hardware and provides the VMs to host the containers. Examples include Amazon ECS and AWS Fargate, Google Cloud Run, and Azure Container Instances.
  • Platform-as-a-service (PaaS) – with PaaS, you’re provided with a computing environment, complete with the operating system and some relevant software, which you can use to develop and deliver applications. Examples include AWS Elastic Beanstalk, Google App Engine, and Azure App Service.
  • Function-as-a-service (FaaS) – for enterprises that require computing resources to execute programs and scripts in response to events but do not require control of the environment or platform, FaaS offers a highly flexible solution that scales automatically.  Examples include AWS Lambda, Google Cloud Functions, and Azure Functions.
  • Software-as-a-service (SaaS) – as the most hands-off cloud service offering, SaaS allows you to use software without installing or updating it. Examples include Google Workspace and Office 365.

Typically cloud computing refers to a public cloud, where multiple customers may share the same underlying hardware. An alternative is to use a private cloud, where resources are restricted to a single customer or “tenant.” 

Private clouds allow scope for greater customization and increased security, but at a higher cost. Overall, there is a cost reduction using cloud computing but it is dependent on good internet connectivity.

Cloud vs. On-Premise: Key Differences

When choosing between on-premise and cloud computing, there are several factors to consider. Here we’ll look at the main ones.

Deployment and maintenance

On-premise: When your computing infrastructure is hosted on-site, you need a dedicated IT team to manage the procurement, installation, networking, upgrades, and maintenance of servers and other hardware, as well as the operating systems and applications running on those machines.

Cloud: With the cloud, purchase, security, and maintenance of physical hardware is handled by the cloud provider. The management level for the software side will depend on the service you choose. With IaaS, you retain a high degree of control and flexibility, but you need to manage everything from the operating system upwards. With FaaS and SaaS, you have far less control over the environment, but you only need to manage your application or functions.

Scalability

On-premise: Managing your computing infrastructure in-house means planning ahead to ensure you have the capacity as your organization grows. The balance can be challenging to find. Failure to provide sufficient resources and infrastructure will become a limiting factor when demand for a service increases; overestimate your future needs and waste time and money.

Cloud: A key benefit of cloud computing is the ease and speed of bringing more instances online to meet demand, thanks to the vast resources available. While there is always some degree of ramp-up time, it’s measured in seconds rather than hours and days.

Reliability

On-premise: Closely related to the question of scalability is redundancy. Can you fail over other instances in the event of a hardware or system failure, and how quickly can you bring additional resources online to return to normal operations? For on-premise infrastructure, you need to assess the risk regularly and provision resources accordingly, trading off the actual cost versus the potential harm from unscheduled downtime.

Cloud: With cloud hosting, the scale of the resources available means that redundancy is built-in. For high-level services such as FaaS and SaaS, the cloud service provider takes responsibility for uptime, so failover modes are not something you need to worry about. For lower-level services, you typically specify how you want infrastructure to behave in the event of failure as part of the configuration, with the price tag varying accordingly.

Cost

On-premise: Buying and maintaining computer infrastructure involves capital outlay and operational expenditure (including running costs and staff expertise). That includes the cost of additional hardware required to allow capacity for future expansion or failover, even when that infrastructure is not in use.

Cloud: Moving to the cloud shifts costs from capital (CAPEX) to operational expenditure (OPEX) and means that you only pay for what you use. Cloud costs can vary considerably depending on whether you’re using public cloud resources or require the security of a dedicated private cloud, the speed of scale-up, and the amount of CPU, memory, and storage you need. As it’s easy for consumption and storage to escalate quickly, it’s important to monitor usage and optimize your use of cloud services to keep costs under control.

Security

On-premise: Security is one of the enterprises’ main drivers for keeping IT infrastructure onsite. Organizations handling critical systems or very sensitive data require enhanced levels of security. In these cases, the need to retain physical control of crucial infrastructure means cloud computing is often not a viable option.

Cloud: Although security concerns are often raised as a reason not to move to the cloud, in some cases, it can improve an organization’s security posture. Cloud service providers benefit from economies of scale, which applies to their security expertise and defenses (both physical and online). For some businesses, the cloud may offer more security than in-house infrastructure. When moving to the cloud, the key is to remain alert to potential security risks, invest in security training for your staff, and apply security best practices.

Compliance

On-premise: For organizations working in heavily regulated industries such as finance or healthcare, rules regarding the location in which data is stored and the controls in place to prevent misuse can prove a blocker to moving to the cloud.

Cloud: While cloud solutions exist that allow enterprises to comply with regulatory regimes – including storing data in particular jurisdictions and recognizing ownership of that data – the onus is on the organization procuring the service to perform their due diligence and implement adequate measures to ensure compliance. 

Splunk Indexer Vulnerability: What You Need to Know

A new vulnerability, CVE-2021-342 has been discovered in the Splunk indexer component, which is a commonly utilized part of the Splunk Enterprise suite. We’re going to explain the affected components, the severity of the vulnerability, mitigations you can put in place, and long-term considerations you may wish to make when using Splunk.

What is the affected component?

The Splunk indexer is responsible for sorting and indexing data that the Splunk forwarder sends to it. It is a central place where much of your observability data will flow as part of your Splunk setup. The forwarder and the indexer communicate with one another using the Splunk 2 Splunk (S2S) protocol. 

The vulnerability itself lies within the validation that is inherent within the S2S protocol. The S2S protocol allows for a field type called field_enum_dynamic. What this field allows you to do is send a numerical value in your payload, and have it automatically mapped to some corresponding text value. This is useful because your machines can talk in status codes, but those codes can be dynamically mapped to human-readable text.

What is the impact of the vulnerability?

This field type, field_enum_dynamic is not validated properly, which means that a specially crafted value can enable a malicious attacker to read memory that they shouldn’t be able to. This is called an Out of Bounds (OOB) read vulnerability, and essentially means an attacker can go out of their intended boundaries. 

An alternative attack might be to intentionally trigger a page fault, which would shut down the Splunk service. Doing this repeatedly would result in a Denial of Service (DoS) attack. For these reasons, this CVE is considered high severity with a score of 7.5. 

What mitigations can you put in place?

Splunk has released patches for the impacted components. The new versions that are not vulnerable to this attack are 7.3.9, 8.0.9, 8.1.3, and 8.2.0. Priority number one should be aiming to get to these versions, where this attack has been totally mitigated.

If upgrading is not something you can do, then you may wish to look into implementing SSL for your Splunk forwarders or simply enable forwarder access control, using a token. These steps will make it more difficult for a malicious attacker to send specially crafted packets to your indexer because they’ll need to compromise the SSL certificate or the token first.

What do we need to think about in the long term?

OOB vulnerabilities can be particularly nasty. Not just because of the possibility of leaking information in Splunk, but if your attacker has some specialist knowledge of your system, they can expand to memory that is being used by completely different applications. For example, if an attacker knows that your SSL certificates are loaded into a different area of memory. They might even be able to implement a full memory dump, one packet at a time. 

This means that the danger of this Splunk vulnerability isn’t just in the Splunk data you may leak, but in the much more sensitive information that the attacker may be able to access. This means that you immediately become dependent on the security of the underlying infrastructure.

How secure is your on-premise infrastructure?

It is tempting to think that your on-premise data centers are a bastion of security. You have complete control over their configuration, so you’re able to finely tune them. In reality, this may not be the case. It’s very easy to forget to enable Address Space Layout Randomization (ASLR) or Data Execution Prevention (DEP) on your instances, both of which would make these types of vulnerabilities more difficult to exploit. These are just two of a number of switches that you need to understand, to build and deploy secure hardware in your data center.

A cloud provider like AWS will automatically enable these types of features for you, so that your virtual machine is immediately more secure. If this type of attack occurred in a cloud-based environment, it would be much more difficult to exploit adjacent applications in memory, because cloud environments often come with a lot of very sensible security defaults to prevent processes from reading beyond their allotted memory. This is part of the reason why 61% of security researchers say that a breach in a cloud environment is usually equally or less dangerous than the same breach in an on-premise environment. 

Would a SaaS observability tool be impacted by this?

Splunk indexers operate within the tenant infrastructure, which means that a vulnerability with the Splunk component is an inherent vulnerability in the user’s software. This decreases the control that the user has because they aren’t the ones producing patches. 

Coralogix is a central, multi-tenant, full-stack observability platform that provides a layer of abstraction between the internal workings of your system and your observability data, preventing vulnerabilities like CVE-2021-342 from being chained with other attacks.