SecurityIncident ResponseOperations

AWS Security Runbooks: Pre-Built Response Procedures for Common Findings

Vigilare Engineering

Platform Team · December 29, 2025 · 10 min read

Security runbooks eliminate the most dangerous aspect of incident response: improvisation under pressure. When an engineer receives a GuardDuty alert at 11 PM, they shouldn't need to think about what to check first, which AWS console to open, or whether the action they're about to take is reversible. A well-written runbook makes the right response obvious, fast, and consistent.

Runbooks are also documentation of institutional knowledge. The senior engineer who knows instinctively how to investigate a compromised IAM role won't always be available. Runbooks transfer that knowledge into a form that's accessible to the entire team.

Runbook Structure

A security runbook should answer four questions: What triggered this? What do I check first? What actions do I take? How do I know it's resolved? The format doesn't need to be complex. A numbered checklist with the relevant AWS console links and CLI commands is more useful than a narrative document.

Standard sections for each runbook:

  • Trigger: The specific finding or alert that activates this runbook (GuardDuty finding type, CloudWatch alarm name, billing anomaly threshold)
  • Initial assessment (2-5 minutes): Quick checks to determine if this is a true positive requiring response or a false positive requiring suppression
  • Containment actions: Immediate actions to stop ongoing damage if confirmed true positive
  • Investigation queries: CloudTrail queries, Config checks, and other investigation steps to understand scope
  • Remediation checklist: Specific items to remove, rotate, or correct
  • Verification: How to confirm the incident is fully resolved
  • Escalation criteria: When to call in senior resources or AWS Support

Runbook: UnauthorizedAccess:IAMUser/InstanceCredentialExfiltration

This GuardDuty finding fires when EC2 instance credentials are used from an external IP address — indicating the credentials may have been extracted from the instance and used elsewhere.

Initial assessment: Check the finding's sourceIPAddress. Is it a known VPN or corporate egress IP? If yes, this may be a developer who copied instance credentials to their laptop (poor practice but not an attack). If the IP is unfamiliar, treat as true positive.

Containment: Modify the instance's IAM role trust policy to deny all sts:AssumeRole from external IPs. If the role has sensitive permissions, immediately add a deny-all inline policy to the role. Check whether the instance itself may be compromised — if the instance was compromised, the role isn't the only concern.

Investigation queries: Query CloudTrail for all actions using the specific session credentials (look for the roleArn and sessionIssuer fields in CloudTrail events). Identify what resources were accessed and whether any new resources were created using the stolen credentials.

Remediation: Rotate the IAM role if any significant actions were taken. Review why the credentials were accessible from the instance metadata without IMDSv2 protection. Enable IMDSv2 on all instances per the IMDSv2 migration guide.

Runbook: CryptoCurrency:EC2/BitcoinTool.B

GuardDuty fires this finding when an EC2 instance communicates with cryptocurrency mining pools.

Initial assessment: Is the instance running any legitimate crypto-related software? (Some companies do run legitimate mining operations on AWS, though this is against the AUP). If no legitimate crypto software is expected, this is almost certainly unauthorized mining activity.

Containment: Immediately isolate the instance by modifying its security group to block all outbound internet access. Do not terminate yet — capture a snapshot for forensics if needed. Check whether other instances in the account show similar findings.

Investigation: Determine how the mining software was installed. Check CloudTrail for unusual API calls that provisioned the instance or modified it. Check if the instance launched itself from a compromised AMI, had user data that installed mining software, or was modified after launch.

Remediation: Terminate the mining instance. If the instance was part of a legitimate workload that was compromised, restore from a known-good snapshot after investigating how the compromise occurred. If the instance was provisioned entirely by an attacker, focus on the compromised credential that provisioned it. See account compromise response for the full credential investigation process.

Runbook: Policy:IAMUser/RootCredentialUsage

GuardDuty fires when the root account credentials are used for any action. Root credentials should essentially never be used in normal operations.

Initial assessment: Who used root credentials and for what action? Check CloudTrail for the root login event and subsequent actions. Was this a known scenario (recovering from a locked-out account, changing support plan, tasks that require root)?

Response: If the root usage was unauthorized or unexpected: immediately change the root password, rotate or invalidate root access keys if any exist, enable MFA on the root account if not already enabled, and investigate what actions were taken with root credentials. Notify leadership — root access is the highest-privilege action in an AWS account.

Prevention: Root credentials should be secured with a strong password stored in a password manager accessible only to specific administrators, hardware MFA enabled, and access keys deleted. Create a recurring calendar reminder to verify root MFA is still in place. Vigilare monitors for root account usage and alerts immediately when root activities appear in CloudTrail.

Runbook: Config Non-Compliance: restricted-ssh

AWS Config fires when a security group is created or modified to allow SSH from 0.0.0.0/0.

Initial assessment: Which security group is affected? Is it attached to any running instances? Was this change intentional (debugging session) or accidental?

Immediate action: Remove the offending rule from the security group. This is safe to do immediately — removing an allow rule never breaks existing connections (though it prevents new ones). If the change was deliberate and there's a legitimate access need, replace the broad rule with a specific source IP.

Process improvement: If this was accidental, investigate how it happened (manual console change during debugging is common). Consider AWS Config auto-remediation for this rule type, or an SCP that prevents security group rules allowing port 22 from 0.0.0.0/0.

Maintaining Runbooks

Runbooks become outdated as your environment evolves. Schedule a quarterly runbook review that: tests each runbook's investigation steps against the current account configuration (queries that worked 6 months ago may no longer return the right data), updates CLI commands when AWS changes APIs, and adds new runbooks for new finding types or monitoring capabilities added since the last review.

Store runbooks in version-controlled documentation alongside your infrastructure code. Changes to runbooks go through the same review process as infrastructure changes. When an incident reveals that a runbook was incomplete or incorrect, update the runbook as part of the post-incident review and commit the update.

Related Reading

FAQ

How detailed should runbooks be?

Detailed enough that a competent engineer unfamiliar with the specific incident type can execute them correctly. "Check CloudTrail" is too vague; a specific CloudTrail Insights query with example output is about right. Include screenshots of the relevant AWS console views when they help orient the reader. The test: can a new team member follow the runbook correctly with no verbal guidance? If not, it needs more detail.

Should runbooks be automated?

Automate the investigation queries (Lambda functions or scripts that run standard CloudTrail queries on demand), but be cautious about automating containment actions. Automated containment (a Lambda function that automatically revokes credentials when GuardDuty fires) has significant blast radius if it triggers on false positives. Automate data gathering; leave final response decisions to humans until you have high confidence in the false-positive rate.

Who should be responsible for runbooks?

Runbooks should be owned by the security or operations team but written with input from the teams who built the affected systems. A runbook for an ECS container finding is more useful when written with input from the team who knows that ECS deployment's architecture. Assign a DRI (directly responsible individual) for each runbook category who reviews and updates them quarterly and acts as the SME when the runbook is executed.

Protect your AWS accounts before it's too late

Vigilare monitors your AWS accounts for suspension risks — billing anomalies, IAM issues, GuardDuty findings, and more — and alerts you before AWS takes action.

Written by Vigilare Engineering

Platform Team