AWS Cost Optimization: A Practical Guide for Production Workloads

AWS bills grow in two ways: intentionally (you're running more stuff) and unintentionally (you're running the wrong stuff or running the right stuff inefficiently). Intentional growth is fine — it means your service is scaling. Unintentional growth is the problem: idle resources, oversized instances, data transfer patterns you didn't plan for, services that accumulated cost through configuration drift.

Most AWS accounts have 15-30% of spend that could be eliminated or reduced without affecting application performance or reliability. This isn't a theoretical number — it's what shows up consistently in AWS cost reviews. The optimizations that produce the largest savings are not particularly complex, but they require someone to actually look at the data and act on it.

Start With Cost Explorer and Tagging

AWS Cost Explorer shows spend by service, account, region, and tag. Before optimizing anything, understand where money is going. Sort services by cost and focus on the top 5 — they represent the majority of spend for most accounts. For each high-cost service, break down spend by the available dimensions: by region, by resource type, by usage type. This drill-down usually reveals one or two specific cost drivers that aren't obvious from the top-level view.

Tagging is prerequisite for cost attribution. Resources without cost allocation tags appear as a single undifferentiated blob in Cost Explorer, making it impossible to understand which team or application drives which costs. Implement a tagging strategy with at minimum: Environment (production/staging/development), Application (name of the service or application), and Owner (team or individual responsible). Use AWS Config to detect and alert on untagged resources.

Eliminate Idle and Orphaned Resources

Idle resources — things that exist but aren't used — are the lowest-hanging fruit in any cost optimization effort. They typically persist because nobody remembers to clean them up after their original purpose ends: a POC environment spun up six months ago, a development database that a team member left running when they changed projects, EBS volumes that remain after instance termination.

Systematically inventory idle resources using AWS Cost Explorer, AWS Trusted Advisor, and Compute Optimizer:

EBS volumes with no attached instance (or attached but with zero read/write operations) are purely waste. Delete or snapshot them.
Unattached Elastic IPs cost $0.005/hour — tiny per resource, but accounts accumulate them. There's no legitimate reason to have unattached EIPs.
RDS instances with no connections over 7+ days are candidates for termination or right-sizing to smaller types. Use RDS Performance Insights to verify no connections before acting.
S3 buckets with lifecycle policies that never expire data accumulate storage cost indefinitely. Review old buckets and implement lifecycle policies that move infrequently accessed data to cheaper storage classes or expire it entirely.

Right-Sizing with Compute Optimizer

Right-sizing means matching instance size to actual workload requirements rather than provisioning for theoretical peak. AWS Compute Optimizer analyzes 14 days of CloudWatch metrics for EC2 instances, RDS instances, Lambda functions, ECS services, and Auto Scaling groups, then recommends changes based on actual utilization patterns.

The Compute Optimizer findings are sorted by projected monthly savings. Start with the highest-savings recommendations and evaluate each: does the recommended instance type have enough memory? Is the recommended size appropriate for traffic spikes that might not appear in a 14-day window? Is the instance part of a purchase commitment that would make changing the type costly?

A common finding is EC2 instances running at 5-15% average CPU utilization but sized for bursting. Compute Optimizer may recommend a smaller instance type. Before making that change, check whether the application is bursty enough that the smaller type would be insufficient during peak periods. 14-day averages can miss seasonal patterns.

Commitment Discounts: When to Buy Reserved Instances or Savings Plans

On-Demand pricing is the most expensive AWS pricing model. Reserved Instances (RIs) and Savings Plans provide 30-60% discounts in exchange for a 1-year or 3-year commitment. The ROI calculation is simple: if a resource is running continuously (>60% of the time), a commitment discount pays off compared to On-Demand.

For new workloads, run On-Demand for 2-3 months to understand actual usage patterns before purchasing commitments. Then analyze your usage with the AWS Cost Explorer RI recommendation tool, which calculates optimal RI coverage based on your actual usage. Start with a 1-year commitment and cover 70-80% of your baseline usage — leave some On-Demand capacity for growth and variability.

Savings Plans are generally preferable to RIs for most workloads because they apply automatically to any EC2 usage (within the commitment amount) regardless of instance family, region, or OS. Compute Savings Plans are the most flexible. RIs still make sense for specific use cases — RDS RIs apply to specific database engines, and the RI pricing model fits well for database workloads that tend to be stable.

Data Transfer Cost Optimization

Data transfer costs are the most commonly underestimated AWS cost. Traffic within a region between services costs nothing for most services, but certain patterns are expensive: data transfer out to the internet ($0.09/GB for the first 10TB), cross-region data transfer, and NAT gateway data processing ($0.045/GB).

Audit data transfer costs in Cost Explorer by filtering on "Data Transfer" usage type. Common expensive patterns: CloudFront origin pulls from S3 in a different region than the CloudFront distribution, EC2 instances downloading dependencies directly from the internet instead of through a shared proxy or VPC endpoint, and applications that pull large data payloads across NAT gateways that could instead use S3 for object delivery.

VPC endpoints for S3, DynamoDB, and other services eliminate NAT gateway data processing charges for traffic to those services. The endpoint itself is free; the savings come from the eliminated NAT gateway charges. For accounts with significant DynamoDB or S3 usage from Lambda or EC2, VPC endpoints typically save significant amounts.

FAQ

How much can realistically be saved through optimization?

Organizations new to cost optimization typically find 15-30% savings in a first-pass review. Accounts that have undergone previous optimization rounds typically find 5-15% in subsequent reviews. The largest savings usually come from right-sizing (10-20% of compute spend), commitment discounts (30-60% on committed resources), and eliminating idle resources (varies widely). Focus on high-spend categories first — even a 10% improvement on a $50,000/month EC2 bill is more valuable than a 50% improvement on a $1,000/month service.

Should I use a third-party FinOps tool or stick with native AWS tools?

Native AWS tools (Cost Explorer, Trusted Advisor, Compute Optimizer) cover 80% of what most organizations need and are free. Third-party tools like CloudHealth, Apptio Cloudability, or Spot.io add value for larger organizations with complex multi-account environments, chargeback requirements, or sophisticated commitment management needs. Start with native tools and upgrade to third-party when you've outgrown their capabilities.

What's the fastest way to reduce an unexpectedly high AWS bill?

If you have an unexpectedly high bill, start with Cost Explorer and drill into the service and usage type causing the spike. Common culprits are data transfer costs (large data movements you didn't account for), EC2 instances launched accidentally or not cleaned up, and Storage costs from data accumulation without lifecycle policies. Identify the specific resource or usage pattern and address that first before doing broader optimization.

Protect your AWS accounts before it's too late

Vigilare monitors your AWS accounts for suspension risks — billing anomalies, IAM issues, GuardDuty findings, and more — and alerts you before AWS takes action.

See Vigilare pricing Talk to us about securing your AWS Browse documentation →

Written by Viktor B.

Co-founder & CEO

Co-founder & CEO of Vigilare. Works on turning the AWS signals that predict account enforcement — billing anomalies, IAM drift, GuardDuty findings, SES reputation, and CloudTrail activity — into a single risk score teams can act on before AWS does.