The AWS logs you miss during an incident
Incident response in the cloud is derailed not by a lack of skill, but by a lack of visibility. Security teams frequently discover critical blind spots only after an incident is already underway, leading to delayed containment, inaccurate attribution, and incomplete forensic analysis.
This report walks through six realistic, real-world inspired scenarios where missing log sources prevented effective investigations. Each scenario highlights the specific AWS logs required to answer the most important questions during a security incident.
The “basic logging trifecta” is essential but insufficient
AWS CloudTrail (management events), VPC Flow Logs, and Route 53 Resolver Query Logs provide a strong foundational baseline for cloud security visibility. However, modern AWS environments rely heavily on serverless, containerized, and data-plane services that require additional logging.
Relying only on this baseline leaves critical gaps in areas such as object-level access, Kubernetes control-plane activity, Lambda invocations, host-level telemetry, and DNS analysis.
Critical visibility gaps and solutions
| Blind Spot | Missing Log Source | Impact |
|---|---|---|
| Network Traffic Origin | VPC Flow Logs | Inability to trace the internal source IP responsible for suspicious/egress traffic |
| Data Exfiltration Scope | S3 Server Access Logs | Inability to determine which specific S3 objects were accessed or downloaded |
| Kubernetes Attribution | EKS Audit Logs | Inability to attribute Kubernetes control-plane actions, such as pod creation, to a specific Kubernetes identity |
| Serverless Abuse | CloudTrail Lambda Data Events | Inability to detect and attribute direct lambda:InvokeFunction API calls outside normal triggers |
| Host-Level Forensics | OS/App Logs via CloudWatch Agent | Lack of host-level evidence such as successful logins, privilege changes, and system activity |
| DNS C2 Channels | Route 53 Resolver Query Logs | Inability to see the actual domains being queried and identify DNS-based command-and-control activity |
Scenario 1: Missed VPC Flow Logs exposes blind spot in network traffic
At 03:12 UTC, an automated alert fires for a sudden and sustained spike in data transfer costs, traced back to a production NAT Gateway. The security team confirms the high-volume egress traffic but quickly reaches a dead end.
They have GuardDuty findings and basic AWS monitoring data, but the most important log source, VPC Flow Logs, was never enabled for the VPC. As a result, they have no way to determine which internal resource is responsible for the traffic.
Investigation breakdown
Step 1: Initial anomaly detected
- Question: “Why did our cloud bill suddenly spike?”
- Log/Data: CloudWatch Metrics for NAT Gateway
- Signal: A sudden, sustained spike in the BytesOutToDestination metric for the production NAT Gateway.
Step 2: Corroborating threat intelligence
- Question: “Is this traffic associated with a known threat?”
- Log/Data: GuardDuty Findings
- Signal: A GuardDuty alert such as Exfiltration:EC2/C&CActivity.B appears, indicating suspicious outbound communication. The public IP corresponds to the NAT Gateway, but the private source inside the VPC remains unknown.
Step 3: The search for the source (the blind spot)
- Question: “Which instance inside the VPC is sending all this traffic?”
- Log/Data: VPC Flow Logs
- Signal: LOGS MISSING. Flow Logs were never configured, leaving no record of the internal source IP.
Step 4: The painstaking manual search
- Question: “Without flow logs, how can we find the source?”
- Action: Manually SSH into multiple EC2 instances to inspect active connections
- Outcome: After hours of manual effort, a compromised instance is finally identified.
Missing log highlight
Without VPC Flow Logs, the team lacked visibility into the internal source of the traffic. These logs would have provided exact srcaddr and dstaddr details, enabling immediate identification of the affected host.
Lesson learned
Enable VPC Flow Logs for all production VPCs and critical subnets. They are essential for tracing network activity and drastically reducing investigation time.
Scenario 2: S3 data exfiltration: Which files were stolen?
A GuardDuty alert flags suspicious S3 activity from a malicious IP address. Investigation reveals that IAM credentials were compromised. CloudTrail management events show activity against a critical S3 bucket, and S3 data events are enabled but S3 Server Access Logs were never configured.
Investigation breakdown
Step 1: Initial detection
- Question: “Why is an IAM user accessing S3 from a malicious IP?”
- Log/Data: GuardDuty Findings
- Signal: Alert Exfiltration:S3/MaliciousIPCaller is generated.
Step 2: Identify compromised principal
- Log/Data: CloudTrail Management Events
- Signal: CloudTrail confirms bucket activity but does not reveal which objects were downloaded.
Step 3: Attempt object-level visibility
- Log/Data: CloudTrail S3 Data Events
- Signal: Data events show API calls like GetObject, but attribution is incomplete.
Step 4: The practical forensic gap
- Log/Data: S3 Server Access Logs
- Signal: LOGS MISSING. No detailed object access information is available.
Missing log highlight
CloudTrail captures API activity but lacks the detailed context needed for real-world forensics. S3 Server Access Logs provide crucial evidence such as:
- Requester IP address
- Object key accessed
- User agent
- Amount of data transferred
Without them, precise breach scope cannot be determined.
Lesson learned
Treat S3 Server Access Logs as mandatory for all sensitive buckets, using CloudTrail data events only as a complementary source.
Scenario 3: Unmasking a rogue pod with EKS Audit Logs
An attacker, having gained initial access, moves laterally toward an EKS environment. The security team is alerted to suspicious outbound traffic from a pod but finds themselves in a difficult position. They have access to VPC Flow Logs, which show the anomalous network connections, and CloudTrail management events, which track AWS API activity. However, they soon discover a critical blind spot: EKS Kubernetes audit logs were never enabled. This missing log source prevents them from attributing control-plane actions, such as who created the rogue pod and how it was introduced into the cluster.
Investigation breakdown
Step 1: Network anomaly detected
- Question: “What is this suspicious outbound traffic?”
- Log/Data: VPC Flow Logs
- Where: AWS Console → CloudWatch → Log Groups
- Signal: High-volume, periodic outbound traffic from a pod-related ENI to a known malicious C2 IP address.
Step 2: ENI to node mapping
- Question: “What owns this network interface?”
- Log/Data: EC2 Network Interface Information
- Where: AWS Console → EC2 → Network Interfaces
- Signal: The ENI is identified as being attached to a specific EKS worker node.
Step 3: Node to pod mapping
- Question: “Which pod on this node is likely responsible for the traffic?”
- Log/Data: Kubernetes API / CloudWatch Container Insights
- Where: CLI: kubectl get pods or CloudWatch Container Insights
- Signal: An unfamiliar pod named rogue-cryptominer-pod is discovered running on the identified worker node. Attribution to the exact pod is possible in many environments, but can be challenging depending on network configuration and CNI setup.
Step 4: The control plane blind spot
- Question: “Who created this rogue pod and through what action?”
- Log/Data: EKS Audit Logs (Missing)
- Where: CloudWatch Log Groups for EKS control plane logs (not enabled)
- Signal: No audit log data is available. There is no record of the Kubernetes API call such as verb=create for the rogue-cryptominer-pod, leaving the originating Kubernetes identity and request context completely unknown.
Missing log highlight
The absence of EKS audit logs was the critical failure in this investigation. Had they been enabled, the team would have had records of Kubernetes API server activity, including fields such as:
- verb: create
- objectRef: name of the pod
- user.username: the Kubernetes user or service account that initiated the request
- source IP and request metadata
These logs would have identified which Kubernetes identity created the pod and when. While audit logs do not always map directly to an AWS IAM user, they provide the essential control-plane visibility required to trace activity back to a service account, automation system, or compromised cluster credential.
Lesson learned
Always enable EKS API server and audit logging for production clusters. These logs are indispensable for understanding who performed actions in the Kubernetes control plane and for reconstructing malicious activity during incident response.
Scenario 4: Tracing serverless abuse with Lambda Invoke Events
The security team receives an alert from an external partner reporting suspicious activity originating from one of the company’s public APIs. The partner claims they are seeing automated requests performing unexpected actions at high speed. Internal monitoring confirms that a specific Lambda function is executing far more frequently than normal.
CloudWatch Logs show unusual behavior inside the function, and API Gateway access logs explain some of the activity. However, the team soon realizes a critical gap: CloudTrail data events for Lambda Invoke were never enabled. As a result, they have no visibility into whether the function is being invoked directly through the Lambda API rather than through its intended triggers.
Investigation breakdown
Step 1: Anomalous activity detected
- Question: “Why is this Lambda function executing so frequently?”
- Log/Data: CloudWatch Metrics for Lambda
- Where: AWS Console → CloudWatch → Metrics → Lambda
- Signal: A sudden and sustained spike in invocation count and duration for a function that normally runs only a few times per minute.
Step 2: Function runtime analysis
- Question: “What is the function doing during these executions?”
- Log/Data: CloudWatch Logs (Lambda Function Output)
- Where: AWS Console → CloudWatch → Log Groups
- Signal: Runtime logs show the function making unexpected outbound connections and processing requests that do not match normal application behavior.
Step 3: Trigger analysis
- Question: “Which known services are triggering this function?”
- Log/Data: API Gateway Access Logs and other configured trigger logs
- Where: API Gateway → Access Logs / CloudWatch
- Signal: API Gateway logs explain some invocations, but the volume of requests does not match the total Lambda execution count. A large portion of invocations appear to be coming from an unknown source.
Step 4: The invocation blind spot
- Question: “Is the function being invoked directly through the Lambda API?”
- Log/Data: CloudTrail Data Events for Lambda (Missing)
- Where: CloudTrail Trail destination (S3 or CloudWatch Logs)
- Signal: No visibility into direct lambda:InvokeFunction API calls. The team cannot determine whether an IAM principal, compromised credentials, automation script, or an external actor is invoking the function outside of normal application pathways.
Missing log highlight
CloudTrail is enabled by default in AWS accounts, but only management events are captured automatically. Lambda invocations (InvokeFunction) are data plane events, which are not logged unless CloudTrail data events are explicitly configured.
Because Lambda Invoke data events were not enabled, the team lacked the only native AWS mechanism for identifying direct API invocations of the function, including:
- The eventName: Invoke action
- The source IP address of the caller
- The IAM principal or service making the call
- The exact time and frequency of direct invocations
It is important to note that CloudTrail Lambda data events do not capture every possible execution. Invocations triggered by services such as API Gateway, SQS, EventBridge, or SNS rely on their own trigger-layer logs for attribution. However, data events are essential for detecting unexpected or unauthorized direct invocations that bypass those services.
Lesson learned
Enable CloudTrail data events for critical or publicly exposed Lambda functions. Default CloudTrail logging does not include invocation activity. Trigger-specific logs (API Gateway, ALB, SQS, EventBridge) are necessary for normal attribution, but only CloudTrail Lambda Invoke data events can reveal direct API abuse. Relying solely on runtime logs and trigger logs leaves a dangerous blind spot.
Scenario 5: The compromised instance with no host-level footprints
A security operations center receives a high-severity alert from AWS GuardDuty for ‘UnauthorizedAccess:EC2/SSHBruteForce’ targeting a production EC2 instance. Shortly after, VPC Flow Logs show suspicious outbound traffic from the same instance to an unknown IP address over a non-standard port. The incident response team has access to GuardDuty findings, VPC Flow Logs, and CloudTrail management events. However, they quickly discover that the CloudWatch Agent was not configured to collect operating system (OS) or application logs from the instance, leaving them with almost no visibility into what occurred on the host after the suspected compromise.
Investigation breakdown
Step 1: Initial detection
- Question: “What triggered the investigation?”
- Log/Data: AWS GuardDuty Finding
- Where: AWS Console → GuardDuty → Findings
- Signal: GuardDuty finding UnauthorizedAccess:EC2/SSHBruteForce identifies multiple failed SSH login attempts from a malicious IP address.
Step 2: Network activity corroboration
- Question: “Can we see the suspicious network traffic?”
- Log/Data: VPC Flow Logs
- Where: CloudWatch Log Groups (Flow Log destination)
- Signal: Flow logs confirm a high volume of REJECT records for TCP port 22, followed by an ACCEPT record, and then new outbound ACCEPT records to a suspicious external IP.
Step 3: Host-level activity investigation (the blind spot)
- Question: “What did the attacker do after gaining access to the instance?”
- Log/Data: OS Logs (e.g., Linux /var/log/auth.log, /var/log/syslog)
- Where: CloudWatch Log Groups (if CloudWatch Agent is configured)
- Signal: No data available. The investigation hits a dead end. The team has no visibility into the successful login event, the user account used, privilege escalation activity, service changes, or other system actions that followed the intrusion.
Missing log highlight
The absence of OS and application logs collected by the CloudWatch Agent created a major investigative blind spot. Without centralized host logs, the team could not determine:
- Which Linux user account successfully authenticated
- Whether sudo or other privilege escalation occurred
- What services were modified or started
- Whether new user accounts or SSH keys were added
- Basic system activity following the intrusion
It is important to note that the CloudWatch Agent primarily collects and centralizes existing logs. It does not generate detailed process execution or command history telemetry on its own. Deeper visibility into commands run and processes spawned would require additional tooling such as auditd, EDR solutions, osquery, or AWS Systems Manager Session Manager logging. However, even standard OS logs would have provided critical foundational evidence that was completely absent in this case.
Lesson learned
Deploy the unified CloudWatch Agent to all production EC2 instances to collect essential OS and application logs (e.g., Linux auth.log and syslog, Windows Security logs). Centralizing these logs in CloudWatch provides the minimum host-level telemetry required for incident response. For advanced investigations, complement this with process-level monitoring solutions such as auditd, EDR tools, or SSM Session Manager logging to capture detailed attacker activity.
Scenario 6: Detecting C2 Channels with Route 53 Resolver Query Logs
During a routine review of network traffic, security analysts notice an unusual pattern: small, periodic bursts of UDP traffic on port 53 egressing from a production VPC. The team has access to VPC Flow Logs, which confirm the source instance and destination IPs. However, they are missing a critical piece of the puzzle Route 53 Resolver query logs were never enabled for the VPC. This means that while they can see DNS traffic is occurring, they have no visibility into what domains are being queried, leaving them blind to a potential DNS-based command-and-control (C2) channel.
Investigation Breakdown
Step 1: Network anomaly detected
- Question: “What is causing the periodic outbound UDP/53 traffic?”
- Log/Data: VPC Flow Logs
- Where: CloudWatch Log Groups (Flow Log destination)
- Signal: Logs show an instance sending UDP packets to multiple external IPs on port 53 at regular intervals, behavior consistent with possible C2 beaconing.
Step 2: Source instance identification
- Question: “Which resource is generating this DNS traffic?”
- Log/Data: EC2 Network Interface (ENI) Information
- Where: AWS Console → EC2 → Network Interfaces
- Signal: The source IP is mapped to a specific ENI attached to an EC2 instance.
Step 3: DNS Query Analysis (the blind spot)
- Question: “What domains is the compromised instance querying?”
- Log/Data: Route 53 Resolver Query Logs
- Where: AWS Console → Route 53 → Resolver → Query Logging
- Signal: No data available. Because Resolver query logging was not enabled for the VPC, the team cannot see the actual DNS queries. They are unable to determine whether the instance is querying legitimate domains, known malicious domains, or algorithmically generated domains (DGAs).
Missing log highlight
The absence of Route 53 Resolver query logs created a critical visibility gap, obscuring the adversary’s DNS infrastructure and severely slowing down containment. The team could identify which instance was generating suspicious DNS traffic, but not the domains being contacted. Without this data, they also could not analyze for techniques such as DNS tunneling, where attackers embed exfiltrated data within DNS queries themselves.
Lesson learned
Enable Route 53 Resolver query logs on all production VPCs and centralize them for analysis. Integrating these logs with a SIEM and Route 53 Resolver DNS Firewall provides powerful capabilities to detect and block advanced threats such as DGA-based command-and-control and DNS tunneling.
Conclusion
Across every scenario, the pattern is the same: investigations stall when logs are missing.
Effective AWS forensics depends on answering a few core questions:
- Who did what? – CloudTrail
- Who talked to whom? – VPC Flow Logs
- Who queried what? – Resolver Query Logs
- Who accessed which objects? – S3 Server Access Logs
- What happened on the host? – OS Logs
- What happened in the cluster? – EKS Audit Logs
Don’t wait for a real incident to discover your blind spots. Enable comprehensive logging now, centralize it securely, and ensure immutability using tools such as S3 Object Lock.
A proactive logging strategy is one of the highest-impact investments you can make in cloud security. Your future incident response team will be grateful you did.