Every AWS account comes with CloudWatch. It collects metrics, stores logs, triggers alarms, and provides basic dashboards — all integrated with every AWS service. For many teams, it's the first and only monitoring tool they set up. For some teams, it should stay that way. For others, it's a trap that gives the illusion of monitoring while leaving dangerous gaps.
This guide helps you figure out which camp you're in. We'll cover what CloudWatch does well, where it falls short, and the specific signals that tell you it's time to invest in third-party monitoring.
What CloudWatch Does Well
Infrastructure metrics: CloudWatch automatically collects CPU utilization, disk I/O, network traffic, and status checks for EC2 instances. It monitors RDS database metrics, Lambda invocation counts and error rates, and S3 bucket metrics. For basic "is my infrastructure healthy?" monitoring, CloudWatch is competent and free (within the free tier limits).
Alarms: CloudWatch alarms trigger when a metric crosses a threshold. You can alert on CPU > 80%, disk space < 20%, or error rate > 1%. Alarms can notify via SNS (email, SMS, Lambda, PagerDuty integration) and can trigger Auto Scaling actions. This is table stakes for infrastructure monitoring and CloudWatch handles it adequately.
Log aggregation: CloudWatch Logs collects logs from EC2 instances (via the CloudWatch agent), Lambda functions (automatic), ECS containers, API Gateway, and dozens of other services. Metric filters on log groups enable alerting on log patterns — useful for detecting application errors, security events, and operational anomalies.
Cost: The free tier includes 10 custom metrics, 10 alarms, 1 million API requests, 5 GB of log data, and 3 dashboards. For a small team running a handful of services, this covers a lot of ground for zero additional cost.
Where CloudWatch Falls Short
No Cross-Service Correlation
CloudWatch treats each metric, alarm, and log group as an independent data point. A CPU spike on an EC2 instance, a latency increase on an ALB, and an error spike in a Lambda function are three separate alarms — even if they're all symptoms of the same incident. CloudWatch has no built-in concept of correlated incidents.
Third-party tools like Datadog, Grafana, and New Relic provide composite monitors, service maps, and correlation views that connect symptoms to root causes. This saves significant time during incidents when you need to understand what's happening, not just that something is wrong.
No Application Performance Monitoring (APM)
CloudWatch tracks infrastructure metrics (CPU, memory, network) but doesn't trace requests through your application. It can't show you that a specific API endpoint is slow because a downstream database query takes 3 seconds, which is caused by a missing index on a table that grew from 100K to 10M rows.
If your application has multiple services communicating over HTTP/gRPC, APM is how you debug performance issues efficiently. CloudWatch doesn't provide this. AWS X-Ray offers basic distributed tracing, but its UX and query capabilities are significantly behind Datadog APM, New Relic, or Jaeger.
Dashboard Limitations
CloudWatch dashboards are functional but basic. They support metric graphs, log query results, and alarm status widgets. But they don't support template variables, drill-down navigation, or the kind of composable, interactive dashboards that Grafana or Datadog provide. For a team that relies on dashboards for on-call triage, CloudWatch dashboards quickly become a bottleneck.
Alerting Sophistication
CloudWatch alarms support static thresholds and anomaly detection (which uses ML to adapt thresholds to patterns). What they don't support: composite conditions ("alert when CPU > 80% AND error rate > 5%"), alert grouping and deduplication, scheduled muting for maintenance windows, or multi-stage escalation policies. Teams that manage more than a handful of alarms quickly find themselves wanting an alert management layer on top of CloudWatch.
No Account-Level Health View
CloudWatch monitors your resources but not your account. It doesn't track billing anomalies, SES reputation, service quota utilization, compliance posture, or the IAM security signals that determine whether your account is healthy. These signals live in Cost Explorer, SES, Service Quotas, Config, and GuardDuty — all separate from CloudWatch.
When to Stay with CloudWatch
CloudWatch is probably enough if you're running a small number of services (under 10 hosts), you don't have a complex microservice architecture, your debugging workflow doesn't require distributed tracing, your team is comfortable with the AWS console UX, and your monitoring needs are primarily "is it up and is it fast?"
When to Add Third-Party Monitoring
Consider adding third-party tools when you're spending more time debugging incidents than fixing them (you need better correlation and APM), when your on-call engineers frequently say "I got the alert but I don't know where to look" (you need better dashboards and context), when you manage multiple AWS accounts and need a unified view (CloudWatch is account-scoped), or when you need to correlate infrastructure metrics with application behavior, business metrics, or external data sources.
Choosing the Right Tool
For infrastructure monitoring + APM: Datadog ($15-46/host/month) or Grafana Cloud (generous free tier). Datadog is easier to set up; Grafana is more cost-effective at scale.
For AWS account health, security, and billing monitoring: Vigilare ($29/month for the Solo plan). This is complementary to — not a replacement for — infrastructure monitoring. Vigilare watches the account-level signals that CloudWatch and Datadog don't: billing anomalies with 5-minute detection, security posture drift, compliance status, and account suspension risk.
For full observability stack: Datadog + Vigilare. Datadog handles infrastructure, APM, and log management. Vigilare handles account health, security correlation, and billing monitoring. No overlap, no gaps.
Related Reading
Protect your AWS accounts before it's too late
Vigilare monitors your AWS accounts for suspension risks — billing anomalies, IAM issues, GuardDuty findings, and more — and alerts you before AWS takes action.
Written by Vigilare Engineering
Platform Team