SecurityIncident ResponseAWS

AWS Incident Response Plan: Building the Process Before You Need It

Viktor B.

Co-founder & CEO · December 27, 2025 · 10 min read

The worst time to design your incident response process is during an incident. When credentials are compromised, resources are being provisioned for mining, and your AWS bill is climbing by the minute, you don't have time to figure out which IAM policy to modify, who needs to be on the incident call, or where to find the CloudTrail logs. These decisions need to be made in advance, documented, and practiced.

An AWS incident response plan doesn't need to be a 50-page document. It needs to be a set of documented processes that are fast to execute under stress, tested against realistic scenarios, and known to the people who will execute them.

Incident Classification

Not all incidents require the same response. Define severity levels that match your response effort:

P1 (Critical): Active credential compromise, data exfiltration in progress, account-level billing anomalies indicating major unauthorized resource use, GuardDuty findings indicating active attacker presence. Requires immediate response 24/7, CEO/CTO notification, possibly AWS Support escalation.

P2 (High): Policy violations with potential for data access, IAM escalation attempts, significant GuardDuty findings (not yet confirmed as active compromise), billing anomalies above a major threshold. Requires response within 1-4 hours during business hours.

P3 (Medium): Compliance drift, repeated authentication failures, low-severity GuardDuty findings, minor Config violations. Requires investigation and remediation within 24 hours.

P4 (Low): Informational findings, best practice deviations, non-critical Config rule violations. Review and remediate within 7 days.

Define these tiers in your runbooks and configure your alerting to route alerts to the appropriate channels and contacts based on severity. A P1 finding at 2 AM should page an on-call engineer; a P4 finding can go to a Slack channel for morning review.

The Incident Response Process

AWS security incidents follow a standard lifecycle: Detection → Containment → Investigation → Eradication → Recovery → Post-Incident. Each phase has specific objectives and AWS-specific actions.

Detection: The faster you detect, the smaller the damage. Layer your detection: GuardDuty for threat intelligence-based findings, CloudTrail alerting for specific API call patterns, billing anomaly alerts for cost-based signals. Vigilare provides unified detection across these sources with pre-built alert rules for common compromise patterns. See AWS security monitoring tools for the full detection stack.

Containment: Stop the damage before investigating the cause. For credential compromise, the first action is disabling or deleting the compromised credentials — don't delay this to preserve investigation access. Rotate the key, revoke the role session, or disable the user. Then investigate using CloudTrail to understand what was done with the compromised credentials. For unauthorized resource provisioning, terminate the unauthorized resources, then investigate.

Investigation: Use CloudTrail to establish a timeline of events. Query CloudTrail for all actions taken by the compromised principal during the incident window. Look for: what data was accessed, what resources were created or modified, whether any IAM changes were made (creating backdoor users or roles), and whether any external data transfers occurred.

Eradication: Remove everything the attacker created. Terminate unauthorized EC2 instances. Delete unauthorized IAM users, roles, and access keys. Revoke any policies the attacker attached. Check for persistence mechanisms: Lambda functions, EC2 user data scripts, unauthorized SSM parameters, CloudFormation stacks containing attacker resources.

Recovery: Restore to normal operations with verified clean state. Rotate all potentially compromised credentials — not just the one known to be compromised, but any credential that might have been accessible from the compromised principal. Enable MFA on accounts that didn't have it. Verify the monitoring controls that should have caught the incident earlier and fix any gaps.

Containment Playbooks

Pre-built containment playbooks reduce response time for common incident types. For each playbook, document:

  • The indicator that triggers this playbook (which GuardDuty finding, which alert)
  • The immediate containment action (disable credential, security group isolation, etc.)
  • The investigation queries (CloudTrail queries, specific fields to check)
  • The eradication checklist (what to look for and remove)
  • Verification steps (how to confirm the incident is contained)

For the compromised IAM credential playbook: disable the access key, query CloudTrail for all actions by that key in the past 90 days, review for unauthorized resource creation, check for IAM backdoors (new users/roles/policies), check for data access, and rotate all credentials in the affected account as a precaution.

AWS Support and Communication

For significant incidents, AWS Support can accelerate investigation and access to internal information about compromised resources. File a support case when: unauthorized resources have been created in quantities that impact your bill significantly, you need AWS to take direct action on your behalf (emergency account restrictions, accelerated recovery), or you believe the incident may involve AWS infrastructure (not just your configuration).

Maintain a communication plan for incidents involving customer data. GDPR, HIPAA, and other regulations require breach notification within defined windows — 72 hours for GDPR, 60 days for HIPAA. Your incident response plan needs to include the notification decision tree: what constitutes a notifiable breach, who makes that determination, and how notifications are sent.

Tabletop Exercises

Plans that are never tested fail under stress. Run tabletop exercises quarterly: present a realistic incident scenario to your security and engineering teams, walk through the response process, and identify gaps in documentation, tooling, or team knowledge. Common exercise scenarios for AWS environments: compromised IAM access key used for crypto mining, S3 bucket made public containing customer data, insider threat creating unauthorized backdoor access.

After each exercise, document what was unclear or missing, and update your runbooks accordingly. The goal is that when a real incident occurs, the team can execute the documented process without improvising.

Related Reading

FAQ

How quickly do I need to respond to a GuardDuty HIGH finding?

For HIGH severity findings, initial assessment should happen within 15-30 minutes during business hours and within 60 minutes outside business hours if you have on-call coverage. The initial assessment determines whether the finding is a true positive requiring containment or a false positive requiring suppression. Not all HIGH findings are true compromises — some are normal operational patterns that GuardDuty doesn't have context to recognize as legitimate. But start with the assumption that it's real until proven otherwise.

Should I notify AWS if I discover a security incident?

You're not required to notify AWS about incidents that only affect your own account. However, if the incident involves potential abuse of AWS infrastructure (DDoS attacks launched from your account, spam sent through SES, phishing infrastructure), notifying AWS helps them take protective action. AWS also has account security resources that can assist with incident investigation and recovery — contact AWS Support for significant incidents regardless of notification requirements.

What's the first thing to do in the first 5 minutes of a suspected credential compromise?

Disable the suspected compromised credential immediately. If it's an IAM access key, deactivate it. If it's an IAM role session, revoke active sessions for that role. Speed of credential invalidation directly determines the damage window — every minute the credential is active is a minute the attacker can continue taking actions. Investigation can wait 5 minutes; credential disablement cannot. After disabling, then begin investigation to understand the scope and what needs to be remediated.

Protect your AWS accounts before it's too late

Vigilare monitors your AWS accounts for suspension risks — billing anomalies, IAM issues, GuardDuty findings, and more — and alerts you before AWS takes action.

Written by Viktor B.

Co-founder & CEO