Recommendations for AWS
CloudZero analyzes your AWS environment and generates recommendations that identify specific resources where you can reduce costs or improve efficiency. Each recommendation includes the affected resource, the estimated savings, and guidance on how to address it.
For details on how to work with recommendations in the CloudZero UI (search, filter, group, take action), see Recommendations.
What you need
Several AWS recommendations require one or both of the following to be enabled in your AWS account. The overview table below marks which prerequisites each recommendation requires. Recommendations marked "CloudZero" use CloudZero's own billing data analysis and require no additional AWS configuration.
| Prerequisite | What it enables | How to enable |
|---|---|---|
| Cost Optimization Hub (COH) | Savings Plans and Reserved Instance purchase recommendations, Lambda cost optimization | Enable in the AWS Cost Management console |
| AWS Compute Optimizer (CO) | Rightsizing, deletion, upgrade, and migration recommendations for EC2, EBS, RDS, Aurora, Fargate, and Lambda | Opt in through the AWS Compute Optimizer console |
Overview of AWS Recommendations
Artificial Intelligence
| Recommendation | Source | What CloudZero identifies |
|---|---|---|
| AWS Savings Plans Purchase Recommendations for Amazon SageMaker AI | COH | Savings Plans purchase opportunities for Amazon SageMaker based on your usage patterns |
Compute
| Recommendation | Source | What CloudZero identifies |
|---|---|---|
| Amazon EC2 Instance Consolidation for Microsoft SQL Server | CloudZero | Opportunities to consolidate Microsoft SQL Server licenses on EC2 instances |
| Amazon EC2 Instance Over-Provisioned for Microsoft SQL Server | CloudZero | EC2 instances running Microsoft SQL Server that have more vCPUs than needed |
| Amazon EC2 Instances Stopped | CloudZero | EC2 instances that are stopped and are candidates for termination |
| Amazon EC2 Migrate to Graviton | COH + CO | EC2 instances that can be migrated to Graviton-based instances |
| Amazon EC2 Reserved Instance Lease Expiration | CloudZero | EC2 Reserved Instances approaching lease expiration |
| Amazon EC2 Reserved Instance Optimization | CloudZero | EC2 Reserved Instance optimization opportunities |
| Amazon EC2 Rightsize Instances | COH + CO | EC2 instances that should be rightsized to optimize cost and performance |
| Amazon EC2 Stop Instances | COH + CO | EC2 instances that should be stopped to reduce costs |
| Amazon EC2 Upgrade Instances | COH + CO | EC2 instances that should be upgraded to newer generation instances |
| AWS Fargate Cost Optimization Delete Recommendations for Amazon ECS | CloudZero | Unused or idle Fargate services that should be deleted |
| AWS Fargate Cost Optimization Recommendations for Amazon ECS | COH + CO | Fargate services with over-provisioned CPU or memory allocations |
| AWS Lambda Cost Optimization Recommendations for Functions | COH + CO | Lambda functions with cost optimization opportunities |
| AWS Savings Plans Purchase Recommendations for Compute | COH | Savings Plans purchase opportunities for compute resources |
| Configure ECR Repository Lifecycle Policy to Reduce Storage Costs | CloudZero | ECR repositories without lifecycle policies configured |
| Delete EBS Snapshot Older Than 180 Days | CloudZero | EC2 snapshots older than 180 days still incurring costs |
| EKS Clusters Incurring Extended Support Charges | CloudZero | EKS clusters incurring extended support charges for end-of-standard-support Kubernetes versions |
| Excessive EC2 Cross-Region Data Transfer | CloudZero | Accounts where cross-region data transfer costs exceed 10% of total EC2 data transfer costs |
| Excessive EC2/ELB Internet Traffic Bypassing CloudFront | CloudZero | Accounts using CloudFront but with significant direct internet egress from EC2/ELB |
| Fix Lambda Function with Excessive Error Rate | CloudZero | Lambda functions experiencing high error rates |
| Fix Lambda Function with Excessive Timeouts | CloudZero | Lambda functions experiencing excessive timeouts |
| Migrate EMR Serverless to ARM (Graviton) | CloudZero | EMR Serverless workloads on x86 architecture that could migrate to ARM Graviton |
| Older Generation Instances Detected | CloudZero | EC2, RDS, and ElastiCache older generation instances with at least $500 in spend |
Databases
| Recommendation | Source | What CloudZero identifies |
|---|---|---|
| Amazon Aurora Delete Clusters | COH + CO | Aurora clusters that should be deleted to reduce costs |
| Amazon Aurora Migrate to Graviton | COH + CO | Aurora clusters that can be migrated to Graviton-based instances |
| Amazon Aurora Rightsize Clusters | COH + CO | Aurora clusters that should be rightsized to optimize cost and performance |
| Amazon Aurora Upgrade Clusters | COH + CO | Aurora clusters that should be upgraded to newer generation types |
| Amazon DynamoDB Reserved Capacity Purchase Recommendations | COH | DynamoDB reserved capacity purchase opportunities |
| Amazon ElastiCache Reserved Node Purchase Recommendations | COH | ElastiCache Reserved Node purchase opportunities |
| Amazon MemoryDB Reserved Node Purchase Recommendations | COH | MemoryDB Reserved Node purchase opportunities |
| Amazon OpenSearch Service Reserved Instance Purchase Recommendations | COH | OpenSearch Service Reserved Instance purchase opportunities |
| Amazon RDS Delete Instances | COH + CO | RDS instances that should be deleted to reduce costs |
| Amazon RDS Migrate to Graviton | COH + CO | RDS instances that can be migrated to Graviton-based instances |
| Amazon RDS Reserved Instance Purchase Recommendations | COH | RDS Reserved Instance purchase opportunities |
| Amazon RDS Rightsize Instances | COH + CO | RDS instances that should be rightsized to optimize cost and performance |
| Amazon RDS Storage Delete Recommendations | COH + CO | RDS database instances with storage that can be deleted |
| Amazon RDS Storage Rightsize Recommendations | COH + CO | RDS database instances with storage that can be rightsized |
| Amazon RDS Storage Upgrade Recommendations | COH + CO | RDS database instances where storage can be upgraded to more cost-effective options |
| Amazon RDS Upgrade Instances | COH + CO | RDS instances that should be upgraded to newer generation types |
| Amazon Redshift Reserved Node Purchase Recommendations | COH | Redshift Reserved Node purchase opportunities |
| Delete Inactive DynamoDB Tables | CloudZero | DynamoDB tables incurring storage costs with no usage activity |
| Excessive RDS Backup Retention | CloudZero | RDS backups and manual snapshots retained beyond 90 days |
| RDS Clusters Incurring Extended Support Charges | CloudZero | RDS instances and clusters on outdated engine versions incurring extended support charges |
| RDS Snapshot Costs Are Higher Than Expected | CloudZero | RDS snapshot costs exceeding 10% of total RDS costs |
| Underutilized Amazon Redshift Clusters | CloudZero | Redshift clusters that are underutilized |
| Upgrade Elasticsearch to Avoid Extended Support Charges | CloudZero | Elasticsearch domains running EOL versions incurring extended support fees |
| Upgrade OpenSearch to Avoid Extended Support Charges | CloudZero | OpenSearch domains running EOL versions incurring extended support fees |
Management Tools
| Recommendation | Source | What CloudZero identifies |
|---|---|---|
| CloudWatch Costs Higher Than Expected | CloudZero | CloudWatch costs that have increased beyond expected thresholds |
| Redundant CloudTrail Usage Detected | CloudZero | Accounts being charged for CloudTrail events due to redundant instances |
Networking & Content Delivery
| Recommendation | Source | What CloudZero identifies |
|---|---|---|
| Delete Idle Load Balancer | CloudZero | Classic Load Balancers that are idle |
| Delete Inactive AWS Network Firewall | CloudZero | Network Firewalls that have processed 0 bytes in the last 30 days |
| Delete Inactive Gateway Load Balancer Endpoint | CloudZero | Gateway Load Balancer endpoints that have processed 0 bytes in the last 30 days |
| Delete Inactive VPC Interface Endpoint | CloudZero | VPC interface endpoints that have processed 0 bytes in the last 30 days |
| Inefficient AWS NAT Gateway Detected | CloudZero | NAT Gateways with hourly charges but minimal data processing |
| Managed NAT Gateway with Excessive Data Transfer | CloudZero | NAT Gateways where data transfer costs exceed 60% of total gateway costs |
| Release Idle Elastic IP Addresses | CloudZero | Elastic IP addresses allocated but not associated with running resources |
Storage
| Recommendation | Source | What CloudZero identifies |
|---|---|---|
| Amazon EBS Delete Volumes | COH + CO | EBS volumes that should be deleted to reduce costs |
| Amazon EBS Rightsize Volumes | COH + CO | EBS volumes that should be rightsized to optimize cost and performance |
| Amazon EBS Upgrade Volumes | COH + CO | EBS volumes that should be upgraded to newer generation types |
| Configure S3 Lifecycle Policy to Abort Incomplete Multipart Uploads | CloudZero | S3 buckets without lifecycle policies for incomplete multipart upload cleanup |
| Consider Intelligent-Tiering or Lifecycle Rules for S3 | CloudZero | S3 buckets with spend only on Standard Storage |
| High Data Retrieval Costs for S3 Glacier Storage | CloudZero | Data retrieval costs on S3 Glacier storage tiers exceeding $100 over 30 days |
| High Non-Standard API Requests for S3 | CloudZero | High spend on non-standard API requests (LIST, HEAD) to S3 |
| High Ratio of S3 API Cost to Storage Cost | CloudZero | S3 buckets where API request costs exceed 80% of total bucket costs |
| High S3 Administrative Fees | CloudZero | S3 buckets where administrative fees exceed 10% of total bucket cost |
| Unarchived Old EBS Snapshots | CloudZero | EBS snapshots stored in standard storage for over 90 days, candidates for archive |
Artificial Intelligence
AWS Savings Plans Purchase Recommendations for Amazon SageMaker AI
Amazon SageMaker Savings Plans offer significant savings on SageMaker usage in exchange for a commitment to a consistent amount of usage (measured in $/hour) for a one or three year term. AWS Trusted Advisor analyzes your SageMaker usage patterns and provides recommendations for purchasing Savings Plans that could reduce your costs.
How to address this
- Review the recommended Savings Plan commitment amount and term
- Navigate to the AWS Cost Management console
- Go to Savings Plans > Purchase Savings Plans
- Select SageMaker Compute as the Savings Plans type
- Enter the recommended commitment amount
- Choose the appropriate term (1-year or 3-year)
- Select the payment option (All Upfront, Partial Upfront, or No Upfront)
- Review and complete the purchase
Additional details
- Savings Plans provide flexibility to change instance families, sizes, operating systems, and regions
- Longer commitment terms (3 years) typically offer higher savings rates
- All Upfront payment provides the highest discount
- Savings Plans automatically apply to eligible usage across your AWS account
- You can stack multiple Savings Plans to match your usage patterns
Compute
Amazon EC2 Instance Consolidation for Microsoft SQL Server
Identifies opportunities to consolidate Microsoft SQL Server licenses on Amazon EC2 instances by using instances with more vCPUs to reduce licensing costs.
This recommendation analyzes your EC2 instances running Microsoft SQL Server and identifies cases where instances are running with fewer vCPUs than the minimum required for SQL Server licensing, multiple smaller instances could be consolidated into larger instances, or SQL Server editions could benefit from instance consolidation.
Microsoft SQL Server licensing is often based on core/vCPU counts, and there are minimum licensing requirements. By consolidating workloads onto instances with more vCPUs that meet or exceed these minimums, you can reduce the total number of SQL Server licenses needed, improve SQL Server performance through better resource allocation, and simplify management by reducing the number of instances.
How to address this
- Review the specific instances flagged, noting current instance type, vCPU count, SQL Server edition, and minimum recommended vCPU count
- Analyze whether workloads can be consolidated onto larger instance types, combined with other SQL Server instances, or migrated to instances that better match licensing tiers
- Plan the consolidation: identify target instance types, group compatible workloads, schedule during maintenance windows, and prepare rollback procedures
- Test SQL Server performance on consolidated instances, verify license compliance, and test application connectivity before production implementation
- Implement during scheduled maintenance: backup databases, follow SQL Server best practices, and update monitoring and backup configurations
- Monitor performance metrics, verify cost savings, and document the new configuration
Additional details
Cost impact calculation:
- Inefficiency Ratio = (Minimum vCPU - Current vCPU) / Minimum vCPU
- Estimated Savings = Instance Cost × Inefficiency Ratio × 0.30
The 0.30 factor estimates that SQL Server licensing comprises approximately 30% of total EC2 instance costs. For example, an instance with 1 vCPU but requiring 4 vCPU minimum (75% inefficiency) costing $100/month would show estimated savings of $22.50/month ($100 × 0.75 × 0.30). Actual savings vary based on your specific licensing agreements and instance types.
- Licensing Models: Ensure compliance with Microsoft SQL Server licensing agreements when consolidating instances
- High Availability: Consider the impact on your high availability and disaster recovery strategy
- Resource Isolation: Evaluate whether workload consolidation aligns with your security and isolation requirements
Amazon EC2 Instance Over-Provisioned for Microsoft SQL Server
Identifies EC2 instances running Microsoft SQL Server that have more vCPUs than needed for SQL Server licensing, presenting opportunities to rightsize to smaller instance types and reduce costs.
Each recommendation includes the current instance type and vCPU count, maximum recommended vCPU count based on workload analysis, recommended instance type, and estimated monthly savings.
How to address this
- Review each flagged instance: current instance type, vCPU count, recommended instance type, estimated savings, and SQL Server edition
- Analyze workload patterns: review CPU utilization over time, identify peak usage, verify the recommended size can handle peak loads
- Plan the rightsizing: prioritize by savings potential, schedule during maintenance windows, prepare rollback procedures
- Test in non-production: validate SQL Server performance, verify application behavior under load
- Execute during scheduled maintenance: stop the instance, change the instance type, start and verify SQL Server, test connectivity
- Monitor CPU and memory utilization, SQL Server performance, and application response times
Additional details
- Licensing Compliance: Verify that downsizing maintains compliance with Microsoft SQL Server licensing requirements
- High Availability: Consider the impact on your HA/DR strategy when changing instance types
- Stop/Start Impact: Changing instance types requires stopping the instance, which causes downtime
- Elastic IPs: Retained when changing instance types
- Instance Store: Data is lost when stopping the instance (EBS-backed volumes are preserved)
Amazon EC2 Instances Stopped
Identifies EC2 instances that are currently stopped and are candidates for termination to reduce costs. While stopped instances do not incur compute charges, they still have associated costs from EBS volumes, Elastic IP addresses, and other resources.
How to address this
- Review stopped instances to determine if they are still needed
- Check for associated EBS volumes and other resources
- Consider terminating instances that are no longer required
- Verify no critical data will be lost before termination
Amazon EC2 Migrate to Graviton
Identifies EC2 instances that can be migrated to Graviton-based instances for cost optimization.
How to address this
- Migrate eligible instances to Graviton-based instance types
- Review application compatibility with ARM-based processors
- Test performance and functionality after migration
Amazon EC2 Reserved Instance Lease Expiration
Identifies EC2 Reserved Instances approaching lease expiration. When Reserved Instance leases expire, instances continue running at on-demand pricing (up to 72% higher).
How to address this
- Review Reserved Instances approaching expiration
- Consider renewing leases for consistent workloads
- Evaluate if Reserved Instances still match current usage patterns
- Consider converting to Savings Plans for more flexibility
- Plan for renewal well before expiration to avoid cost spikes
Additional details
- Plan renewals 30-60 days before expiration
- Consider usage patterns and workload changes
- Evaluate if Reserved Instances still provide optimal coverage
- Review instance types and sizes for current needs
Amazon EC2 Reserved Instance Optimization
Identifies EC2 Reserved Instance optimization opportunities. AWS Trusted Advisor analyzes your usage patterns and recommends purchases or modifications that can reduce your compute costs.
How to address this
- Purchase new Reserved Instances for consistent workloads
- Modify existing Reserved Instances to better match your usage patterns
- Exchange Reserved Instances for different instance types or regions
- Consider Reserved Instance Marketplace for unused capacity
Additional details
Reserved Instances can provide up to 75% savings compared to On-Demand pricing for consistent workloads.
Amazon EC2 Rightsize Instances
Identifies EC2 instances that are over-provisioned or under-provisioned and should be rightsized to optimize cost and performance.
How to address this
- Rightsize instances to match actual resource utilization
- Review CPU, memory, and network utilization metrics
- Test performance after rightsizing to ensure application requirements are met
- Consider using CloudWatch metrics to validate rightsizing recommendations
Amazon EC2 Stop Instances
Identifies EC2 instances that should be stopped to reduce costs.
How to address this
- Stop instances that are not actively being used
- Review instance usage patterns before stopping
- Consider using scheduled stop/start for development instances
- Implement automated stop policies for non-production environments
Amazon EC2 Upgrade Instances
Identifies EC2 instances that should be upgraded to newer generation instances for better price-performance.
How to address this
- Upgrade instances to newer generation types for better price-performance
- Review application compatibility with newer instance types
- Test performance and functionality after upgrade
- Consider Reserved Instances for upgraded instances to maximize savings
AWS Fargate Cost Optimization Delete Recommendations for Amazon ECS
Identifies unused or idle Fargate services that should be deleted to eliminate unnecessary costs. Fargate services with no recent task executions, idle services, and services inactive for extended periods are flagged for deletion.
How to address this
Delete the identified Fargate services to eliminate ongoing costs from idle container infrastructure including CPU, memory, and data transfer charges.
AWS Fargate Cost Optimization Recommendations for Amazon ECS
Identifies Fargate services with over-provisioned CPU or memory allocations that should be rightsized to reduce costs while maintaining performance.
How to address this
Rightsize Fargate services to more appropriate CPU and memory allocations based on actual utilization patterns.
AWS Lambda Cost Optimization Recommendations for Functions
This recommendation identifies AWS Lambda functions that have cost optimization opportunities based on AWS Trusted Advisor recommendations.
How it works
AWS Trusted Advisor continuously monitors your Lambda functions and provides recommendations for cost optimization opportunities. This recommendation surfaces those recommendations to help you identify potential savings.
What CloudZero identifies
- Lambda functions with cost optimization opportunities
- Functions that could benefit from memory allocation adjustments
- Underutilized Lambda functions that could be optimized
- Function configuration optimizations
- Cost optimization recommendations from AWS Trusted Advisor
How it works
- Uses AWS Trusted Advisor's
c1z7kmr05ncheck for Lambda cost optimization - Leverages Trusted Advisor's estimated savings calculations
- Provides dynamic titles with specific recommendations and resource IDs
- Focuses on Lambda service costs and function-specific optimizations
Cost impact
The recommendation calculates potential savings based on Trusted Advisor's estimates for cost optimization opportunities in Lambda functions, including memory allocation, timeout settings, and other function-specific optimizations.
AWS Savings Plans Purchase Recommendations for Compute
This recommendation identifies AWS Savings Plans purchase opportunities for compute resources based on AWS Trusted Advisor recommendations.
How it works
AWS Trusted Advisor analyzes your compute usage patterns across Amazon EC2, AWS Fargate, and AWS Lambda to provide Savings Plans purchase recommendations. This recommendation surfaces those opportunities to help you identify potential savings through committed usage discounts.
What CloudZero identifies
- Savings Plans purchase opportunities for compute resources
- Recommended commitment amounts and terms
- Estimated monthly savings from purchasing Savings Plans
- Account-level purchase recommendations
- Cost optimization opportunities from AWS Cost Optimization Hub
How it works
- Uses AWS Trusted Advisor's
c1z7kmr09ncheck for Savings Plans recommendations - Leverages Trusted Advisor's estimated savings calculations
- Provides dynamic titles with specific recommendations
- Covers EC2, Fargate, and Lambda compute resources
- Account-level recommendations rather than resource-specific
Cost impact
The recommendation calculates potential savings based on Trusted Advisor's estimates for Savings Plans purchases, including recommended commitment amounts, terms, and expected monthly savings.
Configure ECR Repository Lifecycle Policy to Reduce Storage Costs
This check identifies Amazon Elastic Container Registry (ECR) repositories that do not have lifecycle policies configured. Without lifecycle policies, repositories can accumulate old, unused, and untagged container images over time, leading to unnecessary storage costs.
Additional details
ECR repositories without lifecycle policies tend to accumulate images indefinitely. This includes:
- Old image versions that are no longer deployed
- Untagged images from failed or interrupted builds
- Development and testing images that are no longer needed
- Multiple versions of images that exceed retention requirements
Implementing lifecycle policies can significantly reduce storage costs by automatically removing old or unused images based on criteria you define.
How to address this
Configure lifecycle policies for your ECR repositories to automatically clean up old and unused images. A typical lifecycle policy might:
- Keep only the last N tagged images
- Remove untagged images after a certain period (e.g., 7-14 days)
- Remove images older than a certain age
- Keep images with specific tags (like "production" or "latest")
Additional details
The estimated savings is based on your current ECR storage costs. By implementing lifecycle policies, you can typically reduce storage by 20-30% through removal of:
- Untagged images from failed builds
- Old versions of images no longer in use
- Development and testing images
Actual savings will vary based on your image retention requirements and current repository management practices.
How to address this
- Open the Amazon ECR console
- Navigate to the repository identified in the recommendation
- Click "Lifecycle Policy" in the left navigation
- Create a new lifecycle policy using the visual editor or JSON
- Define rules for image retention (e.g., keep last 10 images, remove untagged after 7 days)
- Test the policy using the "Dry run" feature before enabling
- Save and enable the lifecycle policy
Additional details
Delete EBS Snapshot Older Than 180 Days
This recommendation identifies EC2 snapshots that are older than 180 days and are still actively incurring costs. Often these snapshots can be outdated and no longer needed. Cleaning them up can save money.
Threshold: This recommendation is created if the total spend for the identified snapshots exceeds $500 in real cost. When the total spend for those snapshots is reduced below $500 through cleaning them up, the Recommendation will automatically be closed.
EKS Clusters Incurring Extended Support Charges
This recommendation identifies Amazon EKS (Elastic Kubernetes Service) clusters that are incurring extended support charges for using Kubernetes versions that have reached end-of-standard-support.
What Are EKS Extended Support Charges?
AWS charges additional fees for EKS clusters running on Kubernetes versions that have passed their standard support end date. Extended support provides:
- Security patches and bug fixes for the Kubernetes control plane
- Continued access to Amazon EKS optimized AMIs
- Technical support for the extended version
However, these charges can be significant and are avoidable by upgrading to a supported Kubernetes version.
Cost impact
Extended support charges typically add:
- ~$0.60/hour per cluster (~$438/month)
- This is in addition to standard EKS cluster costs ($0.10/hour)
- Represents a 6x increase in control plane costs
For organizations with multiple clusters, these charges can accumulate to thousands of dollars per month.
Additional details
- Cost Optimization: Eliminating extended support charges immediately reduces EKS costs
- Security: Newer Kubernetes versions include important security improvements
- Features: Access to latest Kubernetes features and improvements
- Performance: Newer versions often include performance enhancements
- Compliance: Running EOL software can violate security policies
How to address this
Upgrade your EKS clusters to a Kubernetes version that is within standard support.
Check Current Version
aws eks describe-cluster --name <cluster-name> --query cluster.versionUpgrade Process
-
Review the upgrade path: EKS only allows upgrading one minor version at a time (e.g., 1.21 → 1.22 → 1.23)
-
Update control plane:
aws eks update-cluster-version --name <cluster-name> --kubernetes-version <version> -
Update node groups:
- Managed node groups: Update through AWS Console or CLI
- Self-managed nodes: Update AMIs and roll out new nodes
-
Update add-ons:
aws eks update-addon --cluster-name <cluster-name> --addon-name <addon> --addon-version <version> -
Test thoroughly between each version upgrade
Important Considerations
- Application compatibility: Test workloads with new Kubernetes API versions
- Deprecated APIs: Check for deprecated API usage in your manifests
- Add-ons: Ensure all add-ons (CNI, CoreDNS, kube-proxy) are compatible
- Helm charts: Verify Helm chart compatibility with target version
- Downtime: Plan upgrade window (control plane upgrade causes brief API disruption)
Current Support Timeline
AWS provides 14 months of standard support for each Kubernetes version. For current version support dates, see the AWS EKS documentation.
Best Practices
- Stay current: Aim to be within 2 minor versions of latest
- Upgrade regularly: Don't let versions fall too far behind
- Test in non-prod first: Always test upgrades in dev/staging
- Automate: Use GitOps tools (ArgoCD, Flux) for consistent deployments
- Monitor: Set up alerts for version EOL dates
- Plan ahead: Schedule upgrades well before standard support ends
Resources
- EKS Kubernetes Versions
- EKS Cluster Upgrade Guide
- Kubernetes Deprecation Policy
- EKS Extended Support Pricing
Excessive EC2 Cross-Region Data Transfer
This recommendation identifies AWS accounts where EC2 cross-region data transfer costs exceed 10% of total EC2 data transfer costs. Cross-region data transfer occurs when EC2 instances in one region communicate with resources in another region, incurring per-GB charges that are significantly higher than same-region transfers. These costs often indicate architectural inefficiencies.
Cost impact
Estimated savings: 75% reduction by architecting to keep data within the same region.
Cross-region data transfer is charged per GB, while same-region transfers within an availability zone are free. By consolidating resources within a single region or deploying complete regional stacks, most cross-region costs can be eliminated.
Additional details
- High Cost: Cross-region transfers are significantly more expensive than same-region transfers
- Performance: Added latency between regions impacts application response times
- Architectural Issues: Services and data not co-located
- Hidden Costs: Compute overhead, replication delays, retry logic
Common Causes
- Multi-region without purpose: HA deployed but never used
- Legacy migration artifacts: Partial migration between regions
- Centralized data stores: Single database/cache serving multiple regions
- VPC peering misuse: Cross-region peering for convenience
- Backup/DR traffic: Continuous replication instead of snapshots
How to address this
Step 1: Identify Traffic Sources
Use VPC Flow Logs or Cost Explorer (filter by DataTransfer-Regional-Bytes) to identify which resources are generating cross-region traffic.
Step 2: Fix Common Patterns
Application and Database Split:
- Incorrect: App in us-east-1 communicating with RDS in us-west-2
- Correct: App in us-east-1 communicating with RDS in us-east-1
Centralized Services:
- Incorrect: Services in multiple regions connecting to single Redis in us-east-1
- Correct: Each region has its own Redis instance
Cross-Region Microservices:
- Incorrect: Service A in us-east-1 calling Service B in us-west-2
- Correct: Both services in same region, or both deployed in each region
Step 3: Choose Architecture Strategy
Option A: Single-Region (Simplest)
- Deploy all resources in one region
- Best for most applications
Option B: True Multi-Region (For HA/DR)
- Deploy complete independent stacks in each region
- Use Route53 geo-routing
- NO cross-region traffic during normal operation
Option C: Active-Passive DR (Lower cost)
- Primary region with hot data
- Standby region with snapshots only
- Failover only in disasters
Step 4: Use VPC Endpoints
Replace cross-region AWS service calls with VPC endpoints and regional buckets.
Step 5: Optimize Required Cross-Region Traffic
If cross-region is unavoidable:
- Use AWS PrivateLink (lower cost)
- Batch transfers instead of real-time streaming
- Compress data before transfer
Step 6: Monitor
Set up CloudWatch alarms for DataTransfer-Regional-Bytes to catch regressions.
When Cross-Region Is Acceptable
- Disaster recovery snapshots (periodic, not continuous)
- CloudFront with regional origins
- Regulatory compliance requirements
- True global applications with regional data isolation
Prevention Strategies
- Enforce regional deployment patterns in IaC
- Use security groups to block unexpected cross-region traffic
- Tag resources with region and monitor spending
- Review architecture for new deployments
Excessive EC2/ELB Internet Traffic Bypassing CloudFront
This recommendation identifies AWS accounts using CloudFront CDN but with significant direct internet egress from EC2/ELB. When traffic bypasses CloudFront, you pay 2-5x higher data transfer costs and miss caching, DDoS protection, and global performance benefits.
What CloudZero identifies
- Accounts with active CloudFront distributions
- EC2/ELB direct egress >10% of CloudFront egress costs
- Minimum $1,000/month direct egress
- New services bypassing existing CDN architecture
Cost impact
Savings calculation: 50% reduction in direct egress costs through CloudFront routing, caching, and Origin Shield.
Example: $75k/month in direct egress = $37,500/month savings ($450k annually)
Why 50% savings:
- CloudFront caching reduces origin bandwidth 50-90%
- Origin Shield adds additional cache layer
- Reduced compute costs from fewer origin requests
- Better compression and optimization
Additional details
1. Higher data transfer costs
- Direct is 2-5x more expensive
2. No caching benefits
- Every request hits origin servers
- Increased compute and database load
- Higher latency for global users
3. Missing security & performance
- No AWS Shield DDoS protection
- Single-region latency vs edge caching
- Increased attack surface
Common Causes
- New services deployed without CDN: Microservices/APIs bypass existing CloudFront
- "Dynamic content" misconception: CloudFront caches API responses; even 1-second cache helps
- Legacy architecture: Pre-CDN infrastructure still serving traffic
- Direct API access: Mobile apps/integrations pointing to ALB/EC2 directly
How to address this
Step 1: Identify Sources
Use AWS Cost Explorer to find high-egress resources:
Service: EC2/ELB
Usage Type: DataTransfer-Out-Bytes
Group by: Resource
Step 2: Add Origins to CloudFront
Console: CloudFront → Distributions → Origins → Create origin
- Origin domain: Your ALB DNS or EC2 endpoint
- Protocol: HTTPS only
- Enable Origin Shield for additional caching
Terraform example:
origin {
domain_name = aws_lb.app.dns_name
origin_id = "ALB"
custom_origin_config {
origin_protocol_policy = "https-only"
origin_ssl_protocols = ["TLSv1.2"]
}
origin_shield {
enabled = true
origin_shield_region = "us-east-1"
}
}
default_cache_behavior {
target_origin_id = "ALB"
min_ttl = 0
default_ttl = 60 # Even 1 minute helps
max_ttl = 3600
}Step 3: Update DNS & Application Configs
Point your domain to CloudFront instead of direct ALB/EC2 endpoints.
Step 4: Configure Caching
For dynamic content, cache based on query strings with short TTLs (30-60 seconds).
Step 5: Monitor Results
- Check CloudFront
CacheHitRatemetric - Verify 50-90% reduction in origin requests
- Monitor cost savings in Cost Explorer
When Direct Egress Is Acceptable
- Database replication, backups to third-party services
- VPN connections, B2B integrations with strict IP requirements
- Streaming protocols not supported by CloudFront
Fix Lambda Function with Excessive Error Rate
This recommendation identifies AWS Lambda functions that are experiencing high error rates, which can impact reliability, user experience, and costs.
How it works
AWS Trusted Advisor monitors your Lambda functions and identifies those with elevated error rates. Functions with high error rates can indicate code issues, configuration problems, or external service dependencies that impact application reliability and increase operational costs.
What CloudZero identifies
- Lambda functions with high error rates
- Functions that need error handling improvements
- Code quality and reliability optimization opportunities
- Configuration issues that are causing failures
- Cost optimization recommendations from AWS Trusted Advisor
How it works
- Uses AWS Trusted Advisor's
L4dfs2Q3C2check for Lambda function error rate analysis - Leverages Trusted Advisor's error metrics and recommendations
- Provides dynamic titles with specific actions
- Covers all Lambda functions across all regions
- Function-level recommendations for targeted optimization
Cost and Reliability Impact
Lambda functions with high error rates can result in:
- Increased execution costs from failed invocations
- Poor user experience from service failures
- Potential cascading failures in dependent systems
- Higher operational overhead for error handling
- Missed opportunities for reliability optimization
Error Rate Optimization Strategies
- Error handling: Implement comprehensive error handling and logging
- Code quality: Improve code robustness and error prevention
- Configuration review: Check function configuration and permissions
- External service reliability: Optimize calls to external services
- Retry logic: Implement appropriate retry mechanisms
- Monitoring and alerting: Set up proper monitoring for error detection
Common Error Causes
- Permission issues: Insufficient IAM permissions for function execution
- External service failures: Unreliable external API or service calls
- Resource constraints: Insufficient memory or timeout configurations
- Code bugs: Logic errors or unhandled exceptions
- Configuration problems: Incorrect environment variables or settings
- Network issues: Connectivity problems to external resources
Reliability Improvement Recommendations
- Implement comprehensive error handling: Catch and handle all potential errors
- Add proper logging: Use structured logging for better debugging
- Review IAM permissions: Ensure functions have appropriate permissions
- Optimize external calls: Implement timeouts and retry logic for external services
- Monitor error patterns: Use CloudWatch to track error trends
- Implement circuit breakers: Prevent cascading failures
Best Practices
- Implement proper error handling and logging in all functions
- Use CloudWatch metrics to monitor error rates and trends
- Set up alerts for error rate thresholds
- Implement retry logic with exponential backoff
- Review and test error scenarios regularly
- Use dead letter queues for failed function invocations
- Monitor external service dependencies and their reliability
Fix Lambda Function with Excessive Timeouts
This recommendation identifies AWS Lambda functions that are experiencing excessive timeouts, which can impact performance, reliability, and costs.
How it works
AWS Trusted Advisor monitors your Lambda functions and identifies those with excessive timeout occurrences. Functions that frequently timeout can indicate performance issues, inefficient code, or inappropriate timeout configurations that impact user experience and increase costs.
What CloudZero identifies
- Lambda functions with high timeout rates
- Functions that need timeout configuration adjustments
- Performance optimization opportunities
- Code efficiency improvements
- Cost optimization recommendations from AWS Trusted Advisor
How it works
- Uses AWS Trusted Advisor's
L4dfs2Q3C3check for Lambda function timeout analysis - Leverages Trusted Advisor's performance metrics and recommendations
- Provides dynamic titles with specific actions
- Covers all Lambda functions across all regions
- Function-level recommendations for targeted optimization
Cost and Performance Impact
Lambda functions with excessive timeouts can result in:
- Increased execution costs due to longer running times
- Poor user experience from slow response times
- Potential cascading failures in dependent systems
- Higher error rates and reduced reliability
- Missed opportunities for performance optimization
Timeout Optimization Strategies
- Timeout configuration: Adjust function timeout settings appropriately
- Code optimization: Improve function efficiency and reduce execution time
- Resource allocation: Increase memory allocation for better performance
- Async processing: Use asynchronous patterns for long-running operations
- External service optimization: Optimize calls to external services
- Caching strategies: Implement caching to reduce redundant operations
Common Timeout Causes
- External API calls: Slow or unresponsive external services
- Database queries: Inefficient or slow database operations
- File processing: Large file operations without streaming
- Memory constraints: Insufficient memory allocation
- Cold starts: Initialization delays for complex functions
- Network latency: Slow network connections to external resources
Performance Optimization Recommendations
- Monitor execution times: Track function performance metrics
- Optimize code: Refactor inefficient algorithms and operations
- Use appropriate timeouts: Set realistic timeout values based on actual execution times
- Implement retry logic: Handle transient failures gracefully
- Consider async patterns: Use asynchronous processing for long operations
- Optimize dependencies: Minimize and optimize external service calls
Best Practices
- Set timeout values based on actual execution time plus buffer
- Implement proper error handling and retry mechanisms
- Use CloudWatch metrics to monitor function performance
- Consider breaking large functions into smaller, focused functions
- Implement caching strategies for frequently accessed data
- Monitor and optimize external service dependencies
Migrate EMR Serverless to ARM (Graviton)
This recommendation identifies AWS accounts running EMR Serverless workloads on x86 (Intel/AMD) architecture that could achieve significant cost savings by migrating to ARM-based Graviton processors. AWS Graviton processors offer up to 20% cost savings with equivalent or better performance for most EMR Serverless workloads.
What CloudZero identifies
- EMR Serverless applications running on x86 architecture
- Accounts with any non-ARM EMR Serverless usage
- Potential savings from migrating to ARM Graviton instances
- Both fully x86 deployments and mixed x86/ARM environments
Cost impact
The recommendation calculates potential savings based on:
- 20% cost reduction from migrating x86 workloads to ARM Graviton
- Current monthly x86 EMR Serverless spend
- No performance degradation expected (often performance improves)
Example Scenario
| Metric | Value |
|---|---|
| Monthly x86 EMR Serverless cost | $100,000 |
| ARM migration savings (20%) | $20,000/month |
| Annual savings | $240,000 |
Additional details
- Immediate Cost Savings: 20% reduction in compute costs with minimal effort
- No Performance Trade-off: Graviton processors often provide better performance
- Simple Migration: Usually just requires changing instance configuration
- Growing Support: Most Spark libraries and frameworks support ARM
- Environmental Impact: Graviton processors are more energy efficient
How to address this
Step 1: Verify Application Compatibility
#### Most EMR Serverless Workloads Are Compatible with ARM
#### Check for Any Architecture-specific Dependencies:
* Review custom libraries and packages
* Verify third-party integrations support ARM
* Test in development environment firstStep 2: Update EMR Serverless Application Configuration
Via AWS Console:
- Navigate to EMR Studio → Applications
- Select your application
- Edit application settings
- Under "Architecture", select arm64 (Graviton)
- Save and restart application
Via AWS CLI:
aws emr-serverless update-application \
--application-id <application-id> \
--architecture ARM64Via Terraform:
resource "aws_emrserverless_application" "example" {
name = "my-application"
release_label = "emr-6.10.0"
type = "Spark"
architecture = "ARM64" # Change from "X86_64"
}Step 3: Monitor and Validate
- Monitor job execution times (should be equal or better)
- Verify cost reduction in billing (appears within 24-48 hours)
- Check application logs for any architecture-related issues
Additional details
Graviton Benefits
- Cost: 20% cheaper than comparable x86 instances
- Performance: Up to 40% better price-performance
- Memory: Same memory-to-vCPU ratios available
- Compatibility: Supports Spark 3.x, Python 3.7+, Java 8+
Known Limitations
- Some legacy libraries do not support ARM (rare in modern Spark)
- Custom native code requires recompilation
- Third-party connectors should be verified
Best Practices
- Start with non-production workloads
- Run parallel tests comparing x86 vs ARM performance
- Migrate incrementally (application by application)
- Update documentation to default to ARM for new applications
References
- AWS EMR Serverless Graviton Documentation
- Graviton Performance Best Practices
- EMR Serverless Pricing
Older Generation Instances Detected
The Amazon EC2, RDS, and ElastiCache services continually upgrade their instances to the current generation. Newer generation instances deliver better performance at a lower price point. Periodically check your environments for older generation instances for opportunities to upgrade to the AWS current generation to improve performance and reduce cost. Generally you can save up to 15% of the cost of your older generation instances by upgrading them.
Threshold: This recommendation is created if the total real cost spend for the identified Amazon EC2, RDS, and ElastiCache older generation instances is at least $500 and will be marked as Addressed afterwards when it falls below $500.
Databases
Amazon Aurora Delete Clusters
This recommendation identifies Aurora clusters that should be deleted to reduce costs.
How it works
- Identifies Aurora clusters that are candidates for deletion
- Provides estimated cost savings from deleting unused clusters
- Uses AWS Trusted Advisor recommendations to identify optimal deletion targets
How to address this
- Delete Aurora clusters that are no longer needed
- Create final snapshots before deletion if data needs to be preserved
- Ensure no applications are dependent on the clusters
- Review Aurora Serverless v1 clusters that are candidates for deletion
- Consider the impact on read replicas and other dependent resources
Amazon Aurora Migrate to Graviton
This recommendation identifies Aurora clusters that can be migrated to Graviton-based instances for cost optimization.
How it works
- Identifies Aurora clusters that are candidates for migration to Graviton processors
- Provides estimated cost savings from the migration
- Uses AWS Trusted Advisor recommendations to identify optimal migration targets
How to address this
- Migrate eligible Aurora clusters to Graviton-based instance types
- Review application compatibility with ARM-based processors
- Test database performance and functionality after migration
- Plan for maintenance windows during migration
- Consider the performance benefits and cost savings of Graviton instances
- Evaluate Aurora Serverless v2 with Graviton for variable workloads
Amazon Aurora Rightsize Clusters
This recommendation identifies Aurora clusters that should be rightsized to optimize cost and performance.
How it works
- Identifies Aurora clusters that are over-provisioned or under-provisioned
- Provides estimated cost savings from rightsizing clusters
- Uses AWS Trusted Advisor recommendations to identify optimal rightsizing targets
How to address this
- Rightsize Aurora clusters to match actual resource utilization
- Review CPU, memory, and I/O utilization metrics
- Consider performance requirements when rightsizing
- Test application performance after rightsizing to ensure requirements are met
- Monitor Aurora cluster performance metrics during and after rightsizing
- Consider Aurora Serverless v2 for variable workloads that benefit from auto-scaling
Amazon Aurora Upgrade Clusters
This recommendation identifies Aurora clusters that should be upgraded to newer generation types for cost optimization.
How it works
- Identifies Aurora clusters that are candidates for upgrading to newer generation types
- Provides estimated cost savings from upgrading to more efficient cluster types
- Uses AWS Trusted Advisor recommendations to identify optimal upgrade targets
How to address this
- Upgrade Aurora clusters to newer generation types for better price-performance
- Review application compatibility with newer cluster types
- Plan for maintenance windows during upgrades
- Test database performance and functionality after upgrade
- Consider Aurora Serverless v2 for variable workloads
- Evaluate the benefits of upgrading to newer Aurora engine versions
Amazon DynamoDB Reserved Capacity Purchase Recommendations
This recommendation identifies Amazon DynamoDB reserved capacity purchase opportunities based on AWS Trusted Advisor recommendations.
How it works
AWS Trusted Advisor analyzes your DynamoDB usage patterns to provide reserved capacity purchase recommendations. This recommendation surfaces those opportunities to help you identify potential savings through committed usage discounts for DynamoDB read and write capacity units.
What CloudZero identifies
- Reserved capacity purchase opportunities for DynamoDB tables
- Recommended reserved capacity amounts for read and write units
- Estimated monthly savings from purchasing reserved capacity
- Table-specific purchase recommendations
- Cost optimization opportunities from AWS Cost Optimization Hub
How it works
- Uses AWS Trusted Advisor's
c1z7kmr15ncheck for DynamoDB reserved capacity recommendations - Leverages Trusted Advisor's estimated savings calculations
- Provides dynamic titles with specific recommendations
- Covers DynamoDB read and write capacity units
- Table-level recommendations for targeted optimization
Cost impact
The recommendation calculates potential savings based on Trusted Advisor's estimates for DynamoDB reserved capacity purchases, including recommended capacity amounts, terms, and expected monthly savings compared to on-demand pricing.
Reserved Capacity Benefits
- Up to 70% savings compared to on-demand pricing
- Predictable billing for consistent workloads
- No upfront payment required (No Upfront option available)
- Flexible terms (1 or 3 years)
- Automatic application to matching tables
Amazon ElastiCache Reserved Node Purchase Recommendations
This recommendation identifies ElastiCache Reserved Node purchase opportunities based on AWS Trusted Advisor recommendations.
How it works
AWS Trusted Advisor analyzes your ElastiCache usage patterns and recommends Reserved Node purchases that can reduce your caching costs. This recommendation surfaces those recommendations to help you optimize your Reserved Node portfolio for consistent workloads.
How to address this
- Purchase Reserved Nodes for ElastiCache clusters with consistent usage patterns
- Consider 1-year or 3-year term options based on workload stability
- Evaluate payment options (All Upfront, Partial Upfront, No Upfront)
- Review node types and sizes to ensure optimal capacity planning
Cost impact
Reserved Nodes can provide significant savings compared to On-Demand pricing for consistent workloads. The cost impact represents the potential monthly savings from implementing the recommended Reserved Node purchases.
Amazon MemoryDB Reserved Node Purchase Recommendations
This recommendation identifies MemoryDB Reserved Node purchase opportunities based on AWS Trusted Advisor recommendations.
How it works
AWS Trusted Advisor analyzes your MemoryDB usage patterns and recommends Reserved Node purchases that can reduce your in-memory database costs. This recommendation surfaces those recommendations to help you optimize your Reserved Node portfolio for consistent workloads.
How to address this
- Purchase Reserved Nodes for MemoryDB clusters with consistent usage patterns
- Consider 1-year or 3-year term options based on workload stability
- Evaluate payment options (All Upfront, Partial Upfront, No Upfront)
- Review node types and sizes to ensure optimal capacity planning
Cost impact
Reserved Nodes can provide significant savings compared to On-Demand pricing for consistent workloads. The cost impact represents the potential monthly savings from implementing the recommended Reserved Node purchases.
Amazon OpenSearch Service Reserved Instance Purchase Recommendations
This recommendation identifies OpenSearch Service Reserved Instance purchase opportunities based on AWS Trusted Advisor recommendations.
How it works
AWS Trusted Advisor analyzes your Amazon OpenSearch Service usage patterns and recommends Reserved Instance purchases that can reduce your search and analytics costs. This recommendation surfaces those recommendations to help you optimize your Reserved Instance portfolio for consistent workloads.
How to address this
- Purchase Reserved Instances for OpenSearch domains with consistent usage patterns
- Consider 1-year or 3-year term options based on workload stability
- Evaluate payment options (All Upfront, Partial Upfront, No Upfront)
- Review instance types and sizes to ensure optimal capacity planning
- Consider Reserved Instance purchases for both data and master nodes
Cost impact
Reserved Instances can provide significant savings compared to On-Demand pricing for consistent workloads. The cost impact represents the potential monthly savings from implementing the recommended Reserved Instance purchases.
Amazon RDS Delete Instances
This recommendation identifies RDS instances that should be deleted to reduce costs.
How it works
- Identifies RDS instances that are candidates for deletion
- Provides estimated cost savings from deleting unused instances
- Uses AWS Trusted Advisor recommendations to identify optimal deletion targets
How to address this
- Delete RDS instances that are no longer needed
- Create final snapshots before deletion if data needs to be preserved
- Ensure no applications are dependent on the instances
- Review read replicas and other dependent resources before deletion
Amazon RDS Migrate to Graviton
This recommendation identifies RDS instances that can be migrated to Graviton-based instances for cost optimization.
How it works
- Identifies RDS instances that are candidates for migration to Graviton processors
- Provides estimated cost savings from the migration
- Uses AWS Trusted Advisor recommendations to identify optimal migration targets
How to address this
- Migrate eligible RDS instances to Graviton-based instance types
- Review application compatibility with ARM-based processors
- Test database performance and functionality after migration
- Plan for maintenance windows during migration
- Consider the performance benefits and cost savings of Graviton instances
Amazon RDS Reserved Instance Purchase Recommendations
This recommendation identifies Amazon RDS Reserved Instance purchase opportunities based on AWS Trusted Advisor recommendations.
How it works
AWS Trusted Advisor analyzes your RDS usage patterns to provide Reserved Instance purchase recommendations. This recommendation surfaces those opportunities to help you identify potential savings through committed usage discounts for RDS database instances.
What CloudZero identifies
- Reserved Instance purchase opportunities for RDS database instances
- Recommended Reserved Instance configurations (instance type, size, engine)
- Estimated monthly savings from purchasing Reserved Instances
- Database-specific purchase recommendations
- Cost optimization opportunities from AWS Cost Optimization Hub
How it works
- Uses AWS Trusted Advisor's
c1z7kmr11ncheck for RDS Reserved Instance recommendations - Leverages Trusted Advisor's estimated savings calculations
- Provides dynamic titles with specific recommendations and savings percentages
- Covers all RDS database engines (MySQL, PostgreSQL, MariaDB, Oracle, SQL Server)
- Database-level recommendations for targeted optimization
Cost impact
The recommendation calculates potential savings based on Trusted Advisor's estimates for RDS Reserved Instance purchases, including:
- Up to 69% savings compared to on-demand pricing
- Recommended instance types and sizes
- Expected monthly savings from Reserved Instance commitments
- Database engine-specific optimization opportunities
Reserved Instance Benefits
- Significant cost savings: Up to 69% compared to on-demand pricing
- Predictable billing: Fixed monthly costs for database instances
- No upfront payment: No Upfront option available for flexibility
- Flexible terms: 1 or 3-year commitment options
- Engine coverage: Available for all major RDS database engines
- Multi-AZ support: Reserved Instances work with Multi-AZ deployments
Database Engines Supported
- Amazon Aurora (MySQL and PostgreSQL compatible)
- MySQL
- PostgreSQL
- MariaDB
- Oracle
- SQL Server
How to address this
- Review RDS usage patterns to identify consistent workloads
- Consider Reserved Instances for production databases with steady usage
- Evaluate different Reserved Instance terms (1 vs 3 years)
- Plan for database growth when selecting Reserved Instance sizes
- Monitor Reserved Instance coverage to maximize savings
Amazon RDS Rightsize Instances
This recommendation identifies RDS instances that should be rightsized to optimize cost and performance.
How it works
- Identifies RDS instances that are over-provisioned or under-provisioned
- Provides estimated cost savings from rightsizing instances
- Uses AWS Trusted Advisor recommendations to identify optimal rightsizing targets
How to address this
- Rightsize RDS instances to match actual resource utilization
- Review CPU, memory, and I/O utilization metrics
- Consider performance requirements when rightsizing
- Test application performance after rightsizing to ensure requirements are met
- Monitor database performance metrics during and after rightsizing
Amazon RDS Storage Delete Recommendations
This recommendation identifies Amazon RDS database instances with storage that can be deleted to reduce costs, typically for unused or redundant storage.
What CloudZero identifies
- RDS database instances with unused storage allocations
- Redundant storage that can be safely removed
- Opportunities to eliminate unnecessary storage costs
How to address this
- Review storage utilization and identify unused allocations
- Delete unused or redundant storage
- Clean up orphaned storage resources
Cost impact
- Eliminates ongoing storage costs for unused capacity
- Provides immediate cost savings
- Reduces overall RDS storage expenses
Implementation Effort
Medium - Requires careful verification that storage is truly unused and safe to delete.
Additional details
- Ensure storage is truly unused before deletion
- Verify no dependencies exist before removing storage
- Consider backup requirements before deletion
- Test deletion process in non-production environments first
Amazon RDS Storage Rightsize Recommendations
This recommendation identifies Amazon RDS database instances with storage that can be rightsized to reduce costs while maintaining performance.
What CloudZero identifies
- RDS database instances with over-provisioned storage
- Opportunities to reduce storage allocation to match actual usage
- Potential cost savings from rightsizing storage
How to address this
- Review current storage utilization patterns
- Rightsize storage allocation to match actual usage
- Monitor performance after rightsizing to ensure no impact
Cost impact
- Reduces storage costs by eliminating over-provisioned capacity
- Maintains database performance while optimizing costs
- Provides immediate cost savings on storage charges
Implementation Effort
Medium - Requires careful analysis of usage patterns and testing in non-production environments first.
Additional details
- Always test storage changes in non-production environments first
- Monitor database performance after rightsizing
- Consider future growth when determining new storage allocation
Amazon RDS Storage Upgrade Recommendations
This recommendation identifies Amazon RDS database instances where storage can be upgraded to more cost-effective options or better performance tiers.
What CloudZero identifies
- RDS database instances using outdated storage types
- Opportunities to upgrade to more cost-effective storage options
- Storage that can benefit from performance improvements
How to address this
- Review current storage type and performance requirements
- Upgrade to more cost-effective storage options
- Consider performance improvements available with newer storage types
Cost impact
- Reduces storage costs through more efficient storage types
- May improve performance while reducing costs
- Provides long-term cost optimization benefits
Implementation Effort
Medium - Requires planning for storage migration and potential downtime.
Additional details
- Plan for potential downtime during storage upgrades
- Test upgrade process in non-production environments first
- Consider performance impact of storage changes
- Verify compatibility with current database configuration
Amazon RDS Upgrade Instances
This recommendation identifies RDS instances that should be upgraded to newer generation types for cost optimization.
How it works
- Identifies RDS instances that are candidates for upgrading to newer generation types
- Provides estimated cost savings from upgrading to more efficient instance types
- Uses AWS Trusted Advisor recommendations to identify optimal upgrade targets
How to address this
- Upgrade RDS instances to newer generation types for better price-performance
- Review application compatibility with newer instance types
- Plan for maintenance windows during upgrades
- Test performance and functionality after upgrade
- Consider Reserved Instances for upgraded instances to maximize savings
Amazon Redshift Reserved Node Purchase Recommendations
This recommendation identifies Redshift Reserved Node purchase opportunities based on AWS Trusted Advisor recommendations.
How it works
AWS Trusted Advisor analyzes your Amazon Redshift usage patterns and recommends Reserved Node purchases that can reduce your data warehouse costs. This recommendation surfaces those recommendations to help you optimize your Reserved Node portfolio for consistent workloads.
How to address this
- Purchase Reserved Nodes for Redshift clusters with consistent usage patterns.
- Consider 1-year or 3-year term options based on workload stability.
- Evaluate payment options (All Upfront, Partial Upfront, No Upfront).
- Review node types and sizes to ensure optimal capacity planning.
- Consider Reserved Node purchases for both leader and compute nodes.
Cost impact
Reserved Nodes can provide significant savings compared to On-Demand pricing for consistent workloads. The cost impact represents the potential monthly savings from implementing the recommended Reserved Node purchases.
Delete Inactive DynamoDB Tables
This recommendation identifies DynamoDB tables that are incurring storage costs but show no usage activity. Inactive tables continue to accumulate charges based on data size (per GB-month) even when the tables are not being read from or written to. Unused DynamoDB tables represent pure waste. You are paying for storage without gaining any value. These tables are often remnants of:
- Completed projects or migrations
- Testing and development environments
- Deprecated features or services
- Data that should have been archived or deleted
Threshold: This recommendation is created if a DynamoDB table has no read or write activity in the past thirty days.
Recommended action: Investigate ownership and verify if the table is truly unused. If so, delete the table to eliminate ongoing storage costs. If there is any chance the table is needed later, export the data to S3.
Excessive RDS Backup Retention
CloudZero has identified Amazon RDS backups and manual snapshots retained beyond 90 days, potentially exceeding business or compliance requirements. Long-term retention of RDS snapshots accumulates significant costs over time.
How it works
This recommendation identifies RDS backups and snapshots that are:
- Older than 90 days
- Incurring ongoing storage costs
- Potentially exceeding necessary retention requirements
RDS automated backups retain for 7-35 days and auto-cleanup. This focuses on manual snapshots retained indefinitely unless explicitly deleted.
Additional details
- Cost Accumulation: Storage costs add up as snapshots accumulate ($0.095/GB-month)
- Unmanaged Snapshots: Manual snapshots created for one-time purposes often remain in place indefinitely
- Example: 50 old 500GB snapshots = ~$28,500/year in unnecessary costs
How to address this
-
Review Snapshot Inventory:
- Navigate to RDS → Snapshots → Filter "Manual snapshots" → Sort by date
- Identify purpose of each snapshot (testing, compliance, migration, etc.)
- Determine which are still needed
aws rds describe-db-snapshots --snapshot-type manual \ --query 'DBSnapshots[?SnapshotCreateTime<=`2023-01-01`].[DBSnapshotIdentifier,SnapshotCreateTime]' -
Establish Retention Policy:
- Daily backups: 7-30 days
- Weekly backups: 4-12 weeks
- Monthly backups: 12 months
- Yearly backups: 7 years (compliance only)
- Document and communicate policy
-
Delete Unnecessary Snapshots:
Important: Deletion is permanent - always verify first
aws rds delete-db-snapshot --db-snapshot-identifier mydb-snapshot-2023-01-15 -
Implement Automated Lifecycle:
- Use AWS Lambda to auto-delete based on tags and age
- Tag snapshots:
Purpose,RetentionDays,Retain - Use AWS Backup for centralized lifecycle management
-
Consider Alternative Storage:
- Export to S3 Glacier Deep Archive (~$0.00099/GB-month, 99% cheaper)
- Use AWS Backup archive tier
- Keep only "hot" backups in RDS format
-
Monitor and Alert:
- Set up Cost Anomaly Detection for RDS backup storage
- Create CloudWatch dashboards for snapshot age and costs
- Alert on snapshots exceeding retention policy
Cost impact
- Conservative estimate: 50% reduction (assumes some retention needed)
- Unnecessary snapshots: 100% recoverable
- Pricing: ~$0.095/GB-month (standard), ~$0.021/GB-month (Aurora excess)
Important Considerations
Retention Best Practices
Keep:
- Compliance-required snapshots
- Recent backups (30-90 days) for disaster recovery
- Pre-migration/upgrade snapshots (until validated)
Consider Deleting:
- Ad-hoc test snapshots
- Post-deployment snapshots (after validation)
- Duplicate snapshots
- Decommissioned database snapshots
Operational Risks
- Deletion is permanent (no undelete)
- Verify with database owners before deletion
- Export to S3 if uncertain
- Test that remaining snapshots are restorable
AWS Backup Alternative
Use AWS Backup for automated lifecycle management:
- Centralized management across services
- Automated retention and expiration
- Compliance reporting
- Cold storage transitions
RDS Clusters Incurring Extended Support Charges
CloudZero has identified Amazon RDS database instances and clusters that are running on outdated engine versions and incurring AWS extended support charges. These charges apply when you continue running RDS database engines beyond their standard support end date.
AWS extended support fees can add significant costs to your RDS spending, often 50-100% more than the base instance cost for older engine versions. By upgrading to a supported engine version, you can eliminate these charges entirely while also benefiting from security patches, bug fixes, and performance improvements.
How it works
This recommendation identifies RDS resources that are:
- Running database engine versions that are past their standard support period
- Incurring AWS extended support charges (typically identified by "ExtendedSupport" in usage types or line item descriptions)
- Eligible for upgrade to newer, supported engine versions without extended support fees
Common database engines affected include:
- MySQL 5.7 and earlier versions
- PostgreSQL 11 and earlier versions
- MariaDB 10.3 and earlier versions
- Oracle database versions past their support dates
- SQL Server versions past their support dates
Additional details
- Cost Savings: Extended support charges can double your RDS costs for affected instances. Upgrading eliminates these fees completely
- Security: Newer engine versions receive active security patches and vulnerability fixes
- Performance: Modern database versions include performance optimizations and new features
- Compliance: Many compliance frameworks require running currently supported software versions
- Future-Proofing: Avoiding technical debt by staying on supported versions
How to address this
-
Review Affected Resources: Identify all RDS instances/clusters incurring extended support charges and their current engine versions
-
Plan Upgrades: For each affected resource:
- Check AWS documentation for the upgrade path to the latest supported version
- Review application compatibility with newer database versions
- Identify any deprecated features your application uses
- Plan maintenance windows for the upgrade
-
Test in Non-Production: Before upgrading production databases:
- Restore a snapshot to a test environment
- Upgrade the test instance to the target version
- Run application regression tests
- Verify query performance and compatibility
- Test backup and restore procedures
-
Perform Upgrades: Execute the upgrade during scheduled maintenance windows:
- For minor version upgrades: can often be done with minimal downtime
- For major version upgrades: requires more planning and testing
- Use RDS Blue/Green deployments for zero-downtime upgrades when available
- Take a manual snapshot before upgrading as a safety measure
-
Monitor Post-Upgrade: After upgrading:
- Verify application connectivity and functionality
- Monitor database performance metrics
- Check for any application errors or warnings
- Confirm extended support charges stop appearing in billing
-
Establish Upgrade Cadence: Prevent future extended support charges:
- Track RDS engine version end-of-support dates
- Schedule regular database upgrades before support ends
- Test new versions early in non-production environments
- Keep documentation of version-specific application requirements
Additional details
- Upgrade Paths: Some major version upgrades require intermediate steps (e.g., MySQL 5.7 → 8.0 requires upgrading to 5.7.latest first)
- Downtime: Plan for maintenance windows; consider using read replicas for minimal downtime migrations
- Parameter Groups: Review and update parameter groups to ensure compatibility with new versions
- Application Changes: Some applications need code changes for newer database versions
- Backup Strategy: Always take manual snapshots before major version upgrades
- Blue/Green Deployments: Use RDS Blue/Green deployments for safer, zero-downtime upgrades when available
For detailed upgrade procedures, consult the AWS RDS documentation for your specific database engine.
RDS Snapshot Costs Are Higher Than Expected
This recommendation is created when the percentage of RDS snapshots exceeds 10% of the total RDS costs. A typical organization's RDS snapshot costs will represent 1% to 5% of the total cost of the entire RDS service. When RDS snapshot costs exceed that, it indicates there are an excessive number of snapshots. This is often due to missing or inadequate snapshot retention rules that leave large number of automatic snapshots around, as well as manual snapshots that are not managed by the snapshot retention rules. Think about tightening up snapshot retention rules and cleaning up any unnecessary snapshots in order to save money.
Threshold: This recommendation is created if the total real cost spend for the identified snapshots exceeds 10% of the real cost for all of the RDS service and is at least $500. When the total spend for RDS snapshots falls below 10%, the Recommendation will automatically be closed.
Underutilized Amazon Redshift Clusters
This recommendation identifies Amazon Redshift clusters that are underutilized and could benefit from optimization, based on AWS Trusted Advisor recommendations.
How it works
AWS Trusted Advisor analyzes your Redshift cluster usage patterns and identifies clusters that are not being fully utilized. This surfaces recommendations to help you optimize your data warehouse resources and reduce costs for underutilized infrastructure.
What CloudZero identifies
- Redshift clusters with low CPU utilization
- Clusters with minimal query activity
- Underutilized storage and compute resources
- Clusters that are candidates for downsizing or deletion
How it works
- Uses AWS Trusted Advisor's
G31sQ1E9Ucheck for underutilized Redshift clusters - Leverages Trusted Advisor's estimated savings calculations
- Provides dynamic titles with specific recommended actions
Cost impact
The recommendation calculates potential monthly savings from optimizing underutilized Redshift clusters, helping you eliminate costs for unused or underutilized data warehouse capacity.
Upgrade Elasticsearch to Avoid Extended Support Charges
How it works
AWS charges additional Extended Support fees for Elasticsearch domains running end-of-life (EOL) versions. These charges can add 50-100% to your regular Elasticsearch costs and increase over time.
Additional details
- High cost: Extended support can double your Elasticsearch bill
- Escalating fees: Charges increase the longer you stay on old versions
- Security risk: EOL versions no longer receive security patches
- Performance: Newer versions offer better performance and features
Common EOL Versions with Extended Support
- Elasticsearch 6.x (all versions)
- Elasticsearch 7.0 - 7.9 (early 7.x versions)
Recommendation
Upgrade to the latest supported Elasticsearch version (7.10 or later, or migrate to OpenSearch).
Implementation Steps
Option 1: In-Place Upgrade (Recommended)
- Review compatibility: Check application compatibility with target version
- Backup domain: Create manual snapshot before upgrade
- Upgrade domain: Use AWS Console or CLI to perform rolling upgrade
- Test thoroughly: Validate all queries and integrations work
- Monitor performance: Watch cluster health and query latency
Option 2: Blue/Green Deployment
- Create new domain with target version
- Reindex data from old domain
- Update application endpoints
- Monitor and validate
- Delete old domain
Option 3: Migrate to OpenSearch
Consider migrating to OpenSearch (AWS's open-source fork) for long-term support and latest features.
Cost impact
Eliminates 100% of extended support charges immediately upon upgrade. For a typical domain, this can save $500-$5,000/month depending on cluster size.
Additional details
- Latest security patches
- Improved performance and efficiency
- Access to new features
- Better AWS support
- Lower operational risk
Upgrade OpenSearch to Avoid Extended Support Charges
How it works
AWS charges additional Extended Support fees for OpenSearch domains running end-of-life (EOL) versions. These charges can add 50-100% to your regular OpenSearch costs and increase over time.
Additional details
- High cost: Extended support can double your OpenSearch bill
- Escalating fees: Charges increase the longer you stay on old versions
- Security risk: EOL versions no longer receive security patches
- Performance: Newer versions offer better performance and features
Common EOL Versions with Extended Support
- OpenSearch 1.0 - 1.2 (early 1.x versions)
- Legacy Elasticsearch 7.10 versions migrated to OpenSearch
Recommendation
Upgrade to the latest supported OpenSearch version (2.x or later).
Implementation Steps
Option 1: In-Place Upgrade (Recommended)
- Review compatibility: Check application compatibility with target version
- Backup domain: Create manual snapshot before upgrade
- Upgrade domain: Use AWS Console or CLI to perform rolling upgrade
- Test thoroughly: Validate all queries and integrations work
- Monitor performance: Watch cluster health and query latency
Option 2: Blue/Green Deployment
- Create new domain with target version
- Reindex data from old domain
- Update application endpoints
- Monitor and validate
- Delete old domain
Cost impact
Eliminates 100% of extended support charges immediately upon upgrade. For a typical domain, this can save $500-$5,000/month depending on cluster size.
Additional details
- Latest security patches
- Improved performance and efficiency
- Access to new features (OpenSearch 2.x includes significant improvements)
- Better AWS support
- Lower operational risk
Version Compatibility
OpenSearch maintains strong backward compatibility:
- Most applications work without code changes
- Query syntax largely unchanged from Elasticsearch 7.10
- Plugin compatibility improved in 2.x
Management Tools
CloudWatch Costs Higher Than Expected
The AWS CloudWatch service should be only a small part of your cloud bill. This recommendation detects increases in CloudWatch costs which indicates you use CloudWatch too extensively and can clean up any unnecessary CloudWatch log groups.
Threshold: This recommendation is created if the total spend for the identified CloudWatch log groups exceeds a sliding scale cost that depends on your total 30 day real cost for all AWS services and is at least $500. The following table shows the sliding scale.
| 30 Day Spend for All AWS Services | 30 Day Spend Threshold for CloudWatch Logs |
|---|---|
| < $10,000.00 | $50.00 |
| Between $10,000.00 and $50,000.00 | $100.00 |
| Between $50,000.00 and $100,000.00 | $250.00 |
| Between $100,000.00 and $500,000.00 | $500.00 |
| Between $500,000.00 and $2,500,000.00 | $750.00 |
| > $2,500,000.00 | $1,000.00 |
When your CloudWatch cost falls below the threshold based on your AWS spend, or if falls below $500, the Recommendation will automatically be closed.
Redundant CloudTrail Usage Detected
The AWS CloudTrail service typically does not cost anything unless you have more than an one instance in an account. This recommendation detects whether you are being charged for CloudTrail events, which indicates you have more than one instance in an account and can clean up any redundant CloudTrail instances to eliminate unnecessary spend.
Threshold: This recommendation is created if the total spend for the identified CloudTrail events exceeds a sliding scale cost that depends on your total 30 day real cost for all AWS services, and the CloudTrail cost is at least $500. The following table shows the sliding scale.
| 30 Day Spend for All AWS Services | 30 Day Spend Threshold for CloudTrail |
|---|---|
| < $10,000.00 | $100.00 |
| Between $10,000.00 and $50,000.00 | $250.00 |
| Between $50,000.00 and $100,000.00 | $500.00 |
| Between $100,000.00 and $500,000.00 | $1,000.00 |
| Between $500,000.00 and $2,500,000.00 | $2,500.00 |
| > $2,500,000.00 | $5,000.00 |
When your CloudTrail cost falls below the threshold based on your AWS spend, or if falls below $500, the Recommendation will be closed automatically.
Networking & Content Delivery
Delete Idle Load Balancer
This recommendation identifies AWS Classic Load Balancers (ELBs) that are idle and can be deleted to reduce costs.
How it works
This recommendation uses AWS Trusted Advisor data to identify Classic Load Balancers that have been idle for an extended period. These load balancers continue to incur charges even when not actively serving traffic.
Additional details
- Cost Savings: Idle ELBs incur hourly charges even when not in use
- Resource Cleanup: Helps maintain a clean AWS environment
- Security: Reduces attack surface by removing unused resources
How to address this
- Review: Verify the ELB is truly unused by checking application logs and monitoring
- Backup: Document the ELB configuration before deletion
- Delete: Remove the idle ELB through AWS Console or CLI
- Monitor: Ensure no applications were depending on the deleted ELB
Delete Inactive AWS Network Firewall
This recommendation identifies AWS Network Firewalls that appear to be inactive and could be deleted to reduce costs.
How it works
AWS Trusted Advisor monitors your AWS Network Firewalls and identifies firewalls that have processed 0 bytes of data in the last 30 days. Network Firewalls incur significant hourly charges even when not actively processing traffic, making inactive firewalls a major source of unnecessary spending.
What CloudZero identifies
- Network Firewalls with 0 bytes processed in the last 30 days
- Unused firewalls that are still incurring hourly charges
- Firewalls that were provisioned but never used or are no longer needed
- Opportunities to eliminate unused network security infrastructure
How to address this
- Delete Network Firewalls that have not processed any traffic in the last 30 days
- Review your VPC security architecture to ensure firewalls are still required
- Verify that the firewall is not being used for security inspection or filtering
- Consider consolidating multiple firewalls if possible
- Confirm with security and network teams before deletion to avoid creating security gaps
How it works
- Uses AWS Trusted Advisor's
c2vlfg0bfwcheck for inactive Network Firewalls - Identifies firewalls with zero data transfer over 30 days
- Provides Network Firewall ARNs for easy identification
- Focuses on reducing unnecessary hourly firewall charges
Cost impact
AWS Network Firewalls have substantial hourly charges that accumulate continuously. Each inactive Network Firewall represents significant ongoing waste that can be eliminated immediately. Deleting inactive Network Firewalls provides immediate and substantial cost savings.
Delete Inactive Gateway Load Balancer Endpoint
This recommendation identifies Gateway Load Balancer endpoints that appear to be inactive and could be deleted to reduce costs.
How it works
AWS Trusted Advisor monitors your Gateway Load Balancer (GWLB) endpoints and identifies endpoints that have processed 0 bytes of data in the last 30 days. Gateway Load Balancer endpoints incur hourly charges even when not actively processing traffic, making inactive endpoints a source of unnecessary spending.
What CloudZero identifies
- Gateway Load Balancer endpoints with 0 bytes processed in the last 30 days
- Unused GWLB endpoints that are still incurring hourly charges
- Endpoints that were created for testing or temporary use
- Opportunities to clean up unused network infrastructure
How to address this
- Delete Gateway Load Balancer endpoints that have not been used in the last 30 days
- Review your network architecture to ensure endpoints are still needed
- Verify that security appliances or inspection services no longer require the endpoint
- Confirm with application teams before deletion to avoid service disruption
How it works
- Uses AWS Trusted Advisor's
c2vlfg0k35check for inactive GWLB endpoints - Identifies endpoints with zero data transfer over 30 days
- Provides endpoint IDs and ARNs for easy identification
- Focuses on reducing unnecessary hourly endpoint charges
Cost impact
Gateway Load Balancer endpoints incur hourly charges that accumulate over time. Deleting inactive endpoints eliminates ongoing hourly charges and helps maintain a clean, cost-effective network architecture. While individual endpoint costs are modest, multiple inactive endpoints can represent significant unnecessary spending.
Delete Inactive VPC Interface Endpoint
This recommendation identifies VPC interface endpoints that appear to be inactive and could be deleted to reduce costs.
How it works
AWS Trusted Advisor monitors your VPC interface endpoints and identifies endpoints that have processed 0 bytes of data in the last 30 days. VPC interface endpoints incur hourly charges and data processing costs even when not actively used, making inactive endpoints a source of unnecessary spending.
What CloudZero identifies
- VPC interface endpoints with 0 bytes processed in the last 30 days
- Unused PrivateLink connections that are still incurring hourly charges
- Endpoints that were created for testing or temporary use
- Opportunities to consolidate endpoints using centralized architectures
How to address this
- Delete VPC interface endpoints that have not been used in the last 30 days
- Review your architecture to ensure endpoints are still needed
- Consider deploying VPC interface endpoints in a centralized architecture using Transit Gateway to reduce hourly charges on inactive endpoints
- Verify that applications no longer require the endpoint before deletion
How it works
- Uses AWS Trusted Advisor's
c2vlfg0jp6check for inactive VPC endpoints - Identifies endpoints with zero data transfer over 30 days
- Provides endpoint IDs, VPC IDs, and subnet information for easy identification
- Focuses on reducing unnecessary hourly endpoint charges
Cost impact
While individual VPC interface endpoints have modest hourly costs, these charges accumulate over time and across multiple endpoints. Deleting inactive endpoints eliminates ongoing hourly charges and helps maintain a clean, cost-effective network architecture.
Inefficient AWS NAT Gateway Detected
The AWS VPC service provides NAT Gateways so that resources in private subnets can access resources outside your VPC. When using NAT Gateways, you are charged per NAT Gateway-Hour (rounded up to the hour) and per GB Data Processed.
This recommendation detects NAT Gateways that have hourly charges without appreciable corresponding data processing charges. This indicates unused NAT Gateways that you can clean up.
Threshold: This recommendation is created if the total real cost spend for the identified NAT Gateways with low data processing charges is at least $500 and will be marked as Addressed when the spend falls below $500.
Managed NAT Gateway with Excessive Data Transfer
CloudZero has identified AWS NAT Gateways where data transfer costs represent an unusually high percentage of total gateway costs. While NAT Gateways include both hourly charges and data processing fees, excessive data transfer costs often indicate opportunities to optimize network architecture and reduce unnecessary cross-AZ or internet-bound traffic.
How it works
This recommendation identifies NAT Gateways where:
- Data transfer costs exceed 60% of total NAT Gateway costs
High data transfer ratios can indicate:
- Unnecessary cross-Availability Zone traffic
- Inefficient application architectures routing excessive traffic through NAT
- Missing VPC endpoints for AWS services (S3, DynamoDB, etc.)
- Applications that could benefit from VPC peering or PrivateLink
- Workloads that would be better served by alternative connectivity solutions
Additional details
- Cost Optimization: NAT Gateway data processing fees are expensive and can add up quickly with high-volume workloads
- Architecture Efficiency: High data transfer often signals architectural issues that impact both cost and performance
- Service Availability: Reducing NAT Gateway dependency can improve resilience and reduce single points of failure
- Performance: Alternative solutions like VPC endpoints can provide lower latency and higher throughput
How to address this
-
Analyze Traffic Patterns:
- Use VPC Flow Logs to identify sources and destinations of NAT Gateway traffic
- Determine which applications or services are generating the most traffic
- Identify whether traffic is internet-bound or AWS service traffic
- Check for cross-AZ traffic that could be optimized
-
Implement VPC Endpoints for AWS Services:
- Create Gateway VPC Endpoints for S3 and DynamoDB (no additional cost)
- Deploy Interface VPC Endpoints for services like:
- ECR (Elastic Container Registry)
- ECS (Elastic Container Service)
- Systems Manager
- CloudWatch Logs
- Secrets Manager
- KMS
- VPC endpoints eliminate NAT Gateway traffic for these services entirely
-
Optimize Cross-AZ Traffic:
- Review application architectures that route traffic between Availability Zones through NAT
- Consider deploying NAT Gateways in each AZ to keep traffic local
- Evaluate whether cross-AZ traffic is necessary or can be redesigned
-
Consider VPC Peering or PrivateLink:
- For inter-VPC communication, use VPC peering instead of routing through NAT and internet
- For service-to-service communication, consider AWS PrivateLink
- These alternatives avoid both NAT Gateway costs and internet egress charges
-
Evaluate Alternative Connectivity:
- For large data transfers to the internet, consider using:
- Direct Connect for consistent high-volume workloads
- S3 Transfer Acceleration for uploads
- CloudFront for content delivery
- For outbound-only instances, consider NAT instances for very high throughput scenarios (though less managed)
- For large data transfers to the internet, consider using:
-
Right-size NAT Gateway Deployment:
- Review whether you need NAT Gateways in all Availability Zones
- Consider consolidating in lower-traffic environments (dev/test)
- Balance high availability needs with cost optimization
-
Monitor and Set Alerts:
- Configure CloudWatch alarms for NAT Gateway data processing
- Track data transfer trends over time
- Set up cost anomaly detection for unexpected spikes
Cost Impact Calculation
The cost impact represents the excessive portion of data transfer costs:
- Baseline: Normal NAT Gateway usage typically has data transfer costs around 40-60% of total costs
- Threshold: This recommendation flags gateways where data transfer exceeds 60%
- Savings: Cost impact = (Data Transfer Ratio - 0.60) × Total NAT Gateway Cost
For example, a NAT Gateway with:
- $100/month total cost
- 80% data transfer costs
- Cost impact = (0.80 - 0.60) × $100 = $20/month potential savings
Additional details
- High Availability: When implementing changes, maintain redundancy across Availability Zones for production workloads
- Compliance: Some regulatory requirements mandate specific network architectures
- Migration Planning: Moving to VPC endpoints or alternative solutions requires application testing and validation
- Performance Impact: Always test performance after architectural changes
- Incremental Optimization: Start with high-impact services (S3, ECR) before optimizing smaller traffic sources
Release Idle Elastic IP Addresses
Elastic IP addresses (EIPs) that are allocated but not associated with running resources incur hourly charges. This recommendation identifies idle EIPs that can be released to reduce costs.
What Are Elastic IPs?
Static IPv4 addresses for AWS resources that allow you to:
- Maintain consistent public IPs across instance replacements
- Quickly remap IPs to different instances
- Mask availability zone failures
Common Causes
- Terminated Instances: EIP not released when EC2 instance deleted
- Testing/Development: Allocated for testing and not released afterward
- Infrastructure Changes: Old IPs from decommissioned services
- Deleted Resources: EIPs from removed NAT Gateways or Load Balancers
Detection Method
Uses AmazonVPC billing data for idle addresses:
- Service:
AmazonVPC - Usage Type:
PublicIPv4:IdleAddress - Criteria: Idle 7+ days
Cost impact
| Idle EIPs | Monthly | Annual |
|---|---|---|
| 5 | $18 | $216 |
| 10 | $36 | $432 |
| 50 | $180 | $2,160 |
| 100 | $360 | $4,320 |
How to address this
1. Verify EIP Status
aws ec2 describe-addresses --allocation-ids eipalloc-xxxxxxxxxCheck output:
InstanceId: null→ Safe to releaseInstanceId: i-xxxxx→ Still in use, don't releaseNetworkInterfaceId: eni-xxxxx→ Check if ENI is attached
2. Check Dependencies
Before releasing, verify the EIP is NOT referenced in:
- DNS A records
- Firewall allowlist rules
- Application configurations
- Documentation
3. Release the EIP
AWS Console:
- EC2 → Elastic IPs
- Select unassociated EIP
- Actions → Release Elastic IP addresses
AWS CLI:
aws ec2 release-address --allocation-id eipalloc-xxxxxxxxx4. Update References
After release:
- Update DNS records (if applicable)
- Remove from firewall rules
- Update documentation
Important Considerations
Do NOT release if:
- Referenced in DNS (update DNS first)
- In firewall allowlists (update rules first)
- Reserved for disaster recovery
- Actively used (verify association status)
Recovery: You cannot recover the same IP once released. You must allocate a new one and update all references.
Best Practices
-
Tag all EIPs:
aws ec2 create-tags --resources eipalloc-xxx --tags \ Key=Name,Value="Production API" \ Key=Owner,Value=team-name -
Regular audits: Review idle EIPs monthly in all regions
-
Automation: Set up Lambda to alert on idle EIPs detected
-
Use alternatives when possible:
- Auto-assigned public IPs (free)
- Application Load Balancer (AWS-managed IPs)
- CloudFront (global edge network)
Cost Optimization
- Multiple EIPs per instance: First is free when associated, additional cost $3.60/month each
- NAT Gateway vs Instance: NAT Gateway has no EIP charges (included)
- Load Balancers: ALB/NLB don't require EIPs, often cheaper at scale
Storage
Amazon EBS Delete Volumes
This recommendation identifies EBS volumes that should be deleted to reduce costs.
How it works
- Identifies EBS volumes that are candidates for deletion
- Provides estimated cost savings from deleting unused volumes
- Uses AWS Trusted Advisor recommendations to identify optimal deletion targets
How to address this
- Delete EBS volumes that are no longer needed
- Review volume snapshots before deletion
- Ensure volumes are not attached to running instances
- Consider creating snapshots for important data before deletion
Amazon EBS Rightsize Volumes
This recommendation identifies EBS volumes that should be rightsized to optimize cost and performance.
How it works
- Identifies EBS volumes that are over-provisioned or under-provisioned
- Provides estimated cost savings from rightsizing volumes
- Uses AWS Trusted Advisor recommendations to identify optimal rightsizing targets
How to address this
- Rightsize EBS volumes to match actual storage requirements
- Review volume utilization metrics and I/O patterns
- Consider performance requirements when rightsizing
- Test application performance after rightsizing to ensure requirements are met
Amazon EBS Upgrade Volumes
This recommendation identifies EBS volumes that should be upgraded to newer generation types for cost optimization.
How it works
- Identifies EBS volumes that are candidates for upgrading to newer generation types
- Provides estimated cost savings from upgrading to more efficient volume types
- Uses AWS Trusted Advisor recommendations to identify optimal upgrade targets
How to address this
- Upgrade EBS volumes to newer generation types (e.g., gp3 instead of gp2)
- Review performance requirements before upgrading
- Test application performance after upgrade
- Consider the trade-offs between cost and performance
Configure S3 Lifecycle Policy to Abort Incomplete Multipart Uploads
This recommendation identifies Amazon S3 buckets that do not have lifecycle policies configured to automatically abort incomplete multipart uploads, which can lead to unnecessary storage costs.
How it works
AWS Trusted Advisor monitors your S3 buckets and identifies those without lifecycle policies configured to abort incomplete multipart uploads. Incomplete multipart uploads continue to incur storage costs until they are explicitly aborted or automatically cleaned up by lifecycle policies.
What CloudZero identifies
- S3 buckets without lifecycle policies for incomplete multipart upload cleanup
- Opportunities to implement lifecycle policies for multipart upload management
- Buckets that are accumulating costs from incomplete uploads
- Recommendations for appropriate lifecycle policy configurations
- Cost optimization opportunities from AWS Trusted Advisor
How it works
- Uses AWS Trusted Advisor's
c1cj39rr6vcheck for incomplete multipart upload abort configuration - Leverages Trusted Advisor's cost estimates and recommendations
- Provides dynamic titles with specific actions
- Covers all S3 buckets across all regions
- Bucket-level recommendations for targeted optimization
Cost impact
Buckets without incomplete multipart upload abort policies can result in:
- Accumulation of incomplete multipart upload parts over time
- Continued storage costs for failed or abandoned uploads
- Wasted storage space from incomplete upload fragments
- Missed opportunities for cost optimization through automated cleanup
Multipart Upload Lifecycle Policy Benefits
- Automated cleanup: Abort incomplete multipart uploads automatically
- Cost control: Eliminate storage costs from failed uploads
- Storage optimization: Free up storage space from abandoned uploads
- Predictable costs: Better control over multipart upload-related storage costs
- Simplified management: No manual intervention required for cleanup
Common Multipart Upload Lifecycle Configurations
- Immediate cleanup: Abort incomplete multipart uploads after 1 day
- Standard cleanup: Abort incomplete multipart uploads after 7 days
- Extended cleanup: Abort incomplete multipart uploads after 30 days
- Comprehensive policy: Combine with other lifecycle rules for complete bucket management
Multipart Upload Considerations
- Upload timeouts: Incomplete uploads can occur due to network issues or application failures
- Storage costs: Each part of an incomplete multipart upload incurs storage charges
- Cleanup timing: Balance between allowing retry attempts and cost control
- Application integration: Ensure applications handle multipart upload failures gracefully
How to address this
- Review buckets without incomplete multipart upload abort policies
- Implement lifecycle policies specifically for multipart upload cleanup
- Consider application retry patterns when setting abort timing
- Monitor incomplete multipart upload accumulation
- Use lifecycle policies to automate multipart upload cleanup
- Regularly review and adjust abort policies based on usage patterns
Consider Intelligent-Tiering or Lifecycle Rules for S3
This recommendation is created when there are S3 buckets with spend only on Standard Storage, indicating that use of Intelligent-Tiering or Lifecycle policies could be applied to reduce cost.
Threshold: This recommendation is created if 10% of the total spend on S3 buckets that use Standard storage only is greater than $500.
Standard storage is the default storage class for objects in S3 and is the most expensive. Standard storage is best used for data that needs to be accessed frequently with fastest access time for data retrieval.
Consider the following when determining if S3 Intelligent-Tiering or S3 Lifecycle could be applied to the S3 resources listed to save up to 10% on storage costs.
S3 Intelligent-Tiering:
- Amazon S3 Intelligent-Tiering is an Amazon S3 storage class designed to optimize storage costs by automatically moving data to the most cost-effective access tier when access patterns change, without performance impact or operational overhead.
- S3 Intelligent-Tiering automatically stores objects in three access tiers:
- Frequent Access tier: The default access tier that any object created or transitioned to S3 Intelligent-Tiering begins its lifecycle in. An object remains in this tier as long as it is being accessed. If objects in other tiers are accessed later, S3 Intelligent-Tiering automatically moves the objects back to this tier.
- Infrequent Access tier: If an object is not accessed for 30 consecutive days, the object moves to the Infrequent Access tier with savings up to 40%.
- Archive Instant Access tier: If an object is not accessed for 90 consecutive days, the object moves to the Archive Instant Access tier with savings up to 68%.
- When to use Intelligent-Tiering: Ideal for data with unknown, changing, or unpredictable access patterns, independent of object size or retention period. This includes data for new applications, data analytics, user-generated content, and data lakes.
S3 Lifecycle Rules:
- S3 Lifecycle helps users store objects in a cost effective way throughout their lifecycle by transitioning them to lower-cost storage classes or deleting expired objects on your behalf.
- Lifecycle rules are applied to all existing and future objects in an S3 bucket
- When to use Lifecycle policies: If you have a well-defined access pattern for your data. Ideal for data needing access for a specific period and then archiving at a cheaper storage tier.
Object monitoring and automation for Intelligent-Tiering incurs a small monthly charge. Learn more about S3 pricing and the additional costs associated with S3 in this blog post.
Amazon S3 Lifecycle can be used to transition new objects that are programmatically uploaded to the S3 Intelligent-Tiering storage class.
The resource table shows the list of buckets with spend only on Standard storage.
High Data Retrieval Costs for S3 Glacier Storage
This recommendation identifies data retrieval costs for an S3 bucket occurring on an S3 Glacier storage tier. Data retrieval costs indicates frequently accessed data that could be optimized by moving to a more cost-effective storage class.
Threshold: This recommendation is created if data retrieval costs for data stored in long-term or archival storage on any S3 bucket exceeds $100 over the last 30 days. When the cost impact from all S3 buckets drops back to $100 or below, the Recommendation will resolve.
AWS charges for storing objects in your S3 buckets, and for certain tiers, data retrieval per gigabyte. Amazon S3 provides the following S3 Glacier storage classes:
- S3 Glacier Instant Retrieval (GLACIER_IR): Use for long-term data that is rarely accessed and requires milliseconds for retrieval. Data in this storage class is available for real-time access.
- S3 Glacier Flexible Retrieval (GLACIER): Use for archives where portions of the data need to be retrieved in minutes. Data in this storage class is archived, and not available for real-time access.
- S3 Glacier Deep Archive (DEEP_ARCHIVE): Use for archiving data that rarely needs to be accessed. Data in this storage class is archived, and not available for real-time access.
While S3 buckets stored on these tiers have lower storage costs, there is a cost for retrieving data. Look at these buckets to determine why data retrieval is needed and consider moving frequently accessed data to the Standard storage tier, which does not charge for data retrieval.
A small fee is applied for objects transitioned between storage classes, which is usually very low. Learn more about S3 pricing and the additional costs associated with S3 in this blog post.
Learn more about how to change the storage class for existing objects in the AWS documentation.
High Non-Standard API Requests for S3
This recommendation identifies high spend on non-standard API requests to S3. This high spend indicates excess overhead operations on your objects in S3.
Threshold: This recommendation is created if reducing non-standard S3 API calls will save at least $500 based on a 95% savings rate. When reducing non-standard S3 API calls results in savings less than $500, the Recommendation will automatically be closed.
Non-standard API requests for S3 include operations like LIST and HEAD. The LIST operation is used for retrieving various configuration information for S3 buckets and the HEAD operation is used for retrieving metadata about an object without retrieving the object itself. These operations are categorized as overhead costs, while all other request types, such as GET and PUT, are considered operational costs.
High spend on these overhead operations in comparison to operational costs indicates these operations are adding disproportionate cost. This is normal if you are serving private objects, since HEAD requests for private objects cannot be cached due to the need to generate a signed URL. Public objects can be cached because they do not require a signed URL. For publicly served objects, consider caching these requests with CloudFront to reduce costs.
If object metadata changes frequently, you need to set shorter cache expiration times to ensure your application is receiving the latest information.
High Ratio of S3 API Cost to Storage Cost
This recommendation is created when spend on API requests to an S3 bucket represents greater than 80% of costs for that bucket. This high ratio of requests to storage cost indicates frequently accessed data that could be moved to a different storage class.
Threshold: This recommendation is created if reducing non-standard S3 API calls will save at least $500 based on a 95% savings rate. When reducing non-standard S3 API calls results in savings less than $500, the Recommendation will automatically be closed.
When API requests costs are high, this is because the data being accessed is in an Infrequent Access tier. While Infrequent Access tiers have lower storage costs, there is a cost for every gigabyte of data retrieved and it is billed as an API request.
Consider moving frequently accessed data to Standard storage tier, which does not charge for data retrieval, to save up to 50% on S3 spend.
High S3 Administrative Fees
Typically administrative fees and other miscellaneous costs for a single S3 bucket should not exceed 10% of the total cost of the bucket. Fees related to AWS StorageLens and StorageAnalytics are not included in this check. When administrative fees and miscellaneous costs exceed the 10% threshold, the excess cost usually points to inefficient use of the S3 bucket or potentially unused buckets. The cost impact for this Recommendation is calculated by subtracting the per bucket fees threshold (10% of the total 30 day bucket cost) from the total administrative fees for the specified S3 buckets.
Threshold: This recommendation is created if the total cost impact exceeds $500 in real cost for the last 30 days. When the cost impact drops back below $500, the Recommendation will be resolved.
You can view the fees by grouping by Service Detail. They include:
| Fee | Description |
|---|---|
| DeleteObject (Early Delete) | Some storage tiers are meant for infrequent access and have a minimum storage duration of 30 days. Objects deleted, moved, or overwritten prior to the minimum storage duration incur the normal storage change plus a pro-rated fee for the remaining days. These fees represent the pro-rated cost. Check that you are using the appropriate storage tiers and services based on your access patterns. |
| SmObjects (Small Objects) | Some storage tiers and services have minimum billable object size of 128KB. Objects smaller than 128KB are charged for 128KB. These fees represent the difference between the actual storage used and the minimum billable object size. Check that you are using the appropriate storage tiers and services based on your object sizes. |
| Inventory | Amazon S3 Inventory is a service that generates reports on the content of your S3 buckets. These reports are generated for your own management or auditing purposes, or they are generated for use in conjunction with other AWS services, such as Intelligent Tiering. These fees are associated with the generation and storage of these reports. Check your usage of the inventory services and determine if they are necessary. |
Unarchived Old EBS Snapshots
CloudZero has identified Amazon EBS snapshots that have been stored for an extended period in standard snapshot storage. These long-term snapshots are excellent candidates for EBS Snapshot Archive, which can reduce storage costs by up to 75% for snapshots that are rarely accessed.
How it works
This recommendation identifies EBS snapshots that are:
- Stored in standard EBS snapshot storage (not archived)
- Older than 90 days
- Incurring ongoing standard snapshot storage costs
- Good candidates for migration to EBS Snapshot Archive tier
EBS Snapshot Archive is designed for long-term retention of snapshots that are accessed infrequently, such as compliance archives, disaster recovery backups, or historical reference snapshots.
Additional details
- Cost Savings: Snapshot Archive storage costs ~75% less than standard snapshot storage ($0.0125/GB-month vs $0.05/GB-month)
- Compliance: Maintain required long-term backups while dramatically reducing costs
- No Data Loss: Archives preserve complete snapshot data in a lower-cost tier
- Scalability: As snapshot storage grows over time, these savings compound
For example, a 1TB snapshot stored for a year:
- Standard storage: $600/year
- Archive storage: $150/year
- Savings: $450/year per TB
How to address this
-
Review Snapshot Usage Patterns:
- Identify snapshots that are retained for compliance or disaster recovery
- Determine which snapshots are rarely or never restored
- Confirm snapshots older than 90 days are good archival candidates
- Verify that longer restore times (24-72 hours) are acceptable
-
Archive Eligible Snapshots:
Via AWS Console:
- Navigate to EC2 → Snapshots
- Select snapshot(s) to archive
- Actions → Archive snapshot
Via AWS CLI:
aws ec2 modify-snapshot-tier \ --snapshot-id snap-1234567890abcdef0 \ --storage-tier archiveBulk Archive via CLI:
# List old snapshots aws ec2 describe-snapshots \ --owner-ids self \ --query 'Snapshots[?StartTime<=`2023-01-01`].SnapshotId' \ --output text | \ while read snap; do aws ec2 modify-snapshot-tier \ --snapshot-id $snap \ --storage-tier archive done -
Implement Automated Archival Policies:
- Use AWS Data Lifecycle Manager (DLM) to automatically archive snapshots based on age
- Create lifecycle policies that:
- Move snapshots to archive tier after 90 days
- Delete archived snapshots after retention period expires
- Apply to specific volumes by tags
-
Set Up Monitoring:
- Track snapshot storage costs over time
- Monitor archive vs standard storage distribution
- Set CloudWatch alarms for unexpected snapshot growth
- Review archived snapshots quarterly to confirm retention needs
-
Document Restore Process:
- Document that archived snapshots take 24-72 hours to restore
- Update disaster recovery runbooks with new restore timelines
- Communicate changes to teams that need to restore snapshots
- Test restore process from archive to verify procedures
-
Review Retention Policies:
- Evaluate whether all snapshots need to be retained
- Delete snapshots that are no longer needed for compliance or recovery
- Consider tiered retention: recent snapshots → archive → deletion
Cost Impact Calculation
The cost impact represents potential savings from archiving:
- Standard Storage: ~$0.05 per GB-month (varies by region)
- Archive Storage: ~$0.0125 per GB-month (75% cheaper)
- Savings: 75% of current standard snapshot storage costs
For a snapshot older than 90 days with $100/month in storage costs:
- Moving to archive saves: $75/month or $900/year
Important Considerations
Restore Times
-
Standard snapshots: Instant availability for volume creation
-
Archived snapshots: 24-72 hours to restore to standard tier before use
-
Only archive snapshots where slow restore is acceptable
Use Cases for Archive
Good candidates:
- Compliance/regulatory retention backups
- Long-term disaster recovery snapshots
- Historical reference snapshots
- End-of-month/quarter/year snapshots
- Snapshots of decommissioned resources
Poor candidates:
- Snapshots for active disaster recovery (need fast restore)
- Recent snapshots (< 90 days old)
- Snapshots used for frequent testing or development
- Snapshots that need to be available quickly
Pricing Considerations
- Archive storage: $0.0125/GB-month (~$12.75/TB-month)
- Restore from archive: $0.03/GB retrieval charge (one-time when restoring)
- Standard storage: $0.05/GB-month (~$51.20/TB-month)
If you need to restore an archived snapshot frequently, the retrieval charges can offset savings.
Operational Impact
- No changes to snapshot permissions or sharing
- Snapshot IDs remain the same
- Tags and metadata are preserved
- Can restore to standard tier at any time (with 24-72 hour delay)
Best Practices
- Age-Based Policy: Archive snapshots automatically after 90-180 days
- Tag-Based Archival: Use tags to identify archive candidates (e.g.,
Archivable=true) - Test Restore Process: Periodically test restoring from archive to verify procedures
- Lifecycle Management: Use DLM for automated archival and eventual deletion
- Cost Tracking: Monitor savings from archival using Cost Explorer tags
- Document Exceptions: Clearly identify snapshots that should never be archived
Have questions or feedback? Reach out to your account manager.
Updated 7 days ago
