← Back
Amazon Bedrock Cost Optimization Guide
Service Overview
What is Amazon Bedrock?
- Fully managed service for building and scaling generative AI applications
- Access to foundation models (FMs) from leading AI providers (Anthropic, Meta, Mistral AI, Amazon)
- Multiple inference options: On-Demand, Provisioned Throughput, and Batch Inference
- Built-in capabilities: model customization, agents, guardrails, and knowledge bases
- Pay-per-token pricing model based on input and output tokens processed
Why Cost Optimization Matters
- Generative AI workloads can represent 40-60% of AI/ML costs in organizations
- Token-based pricing can lead to unexpected costs without proper optimization
- Multiple model options with vastly different pricing (Nova Micro: $0.000035/1K vs Claude Opus: $0.075/1K output)
- Common cost surprises include over-prompting, inappropriate model selection, and inefficient inference patterns
---
Cost Analysis & Monitoring
Key Cost Metrics to Track
Primary Cost Drivers:
- **Input Tokens** - Text provided to the model ($0.000035-$0.015 per 1K tokens)
- **Output Tokens** - Text generated by the model ($0.00014-$0.075 per 1K tokens)
- **Model Selection** - Different models have vastly different pricing structures
- **Provisioned Throughput** - Reserved capacity for consistent workloads ($35-$230+ per hour)
- **Guardrails** - Content filtering and safety features ($0.10-$0.15 per 1K text units)
Token Economics:
- **Rule of thumb:** 750 words ≈ 1,000 tokens (multiply words by 1.3)
- **Cost range:** $0.000035 (Nova Lite input) to $0.075 (Claude Opus output) per 1K tokens
- **Individual requests:** Nearly negligible cost ($0.000015 typical)
Cost Allocation Tags:
- Application/UseCase for cost attribution to specific AI initiatives
- Environment (dev, staging, prod) for lifecycle management
- Team/Department for organizational cost tracking
- ModelType (chat, summarization, analysis) for workload categorization
Using the Power's Tools
Get Bedrock costs by model:
usePower("aws-cost-optimization", "awslabs.billing-cost-management-mcp-server", "cost_explorer", {
"operation": "getCostAndUsage",
"start_date": "2024-11-01",
"end_date": "2024-12-01",
"granularity": "MONTHLY",
"group_by": "[{\"Type\": \"DIMENSION\", \"Key\": \"SERVICE\"}]",
"metrics": "[\"UnblendedCost\"]",
"filters": "{\"Dimensions\": {\"Key\": \"SERVICE\", \"Values\": [\"Amazon Bedrock\"]}}"
})
Analyze Bedrock usage patterns by account:
usePower("aws-cost-optimization", "awslabs.billing-cost-management-mcp-server", "cost_explorer", {
"operation": "getCostAndUsage",
"start_date": "2024-11-01",
"end_date": "2024-12-01",
"granularity": "DAILY",
"group_by": "[{\"Type\": \"DIMENSION\", \"Key\": \"LINKED_ACCOUNT\"}]",
"metrics": "[\"UsageQuantity\", \"UnblendedCost\"]",
"filters": "{\"Dimensions\": {\"Key\": \"SERVICE\", \"Values\": [\"Amazon Bedrock\"]}}"
})
Get Bedrock pricing information:
usePower("aws-cost-optimization", "awslabs.aws-pricing-mcp-server", "get_pricing", {
"service_code": "AmazonBedrock",
"region": ["us-east-1", "us-west-2"],
"filters": [
{"Field": "productFamily", "Value": "Machine Learning", "Type": "EQUALS"}
]
})
Monitor Bedrock token utilization:
usePower("aws-cost-optimization", "awslabs.cloudwatch-mcp-server", "get_metric_statistics", {
"namespace": "AWS/Bedrock",
"metric_name": "InputTokenCount",
"dimensions": [{"Name": "ModelId", "Value": "anthropic.claude-3-haiku-20240307-v1:0"}],
"start_time": "2024-11-01T00:00:00Z",
"end_time": "2024-12-01T00:00:00Z",
"period": 3600,
"statistics": ["Sum", "Average"]
})
Create cost-per-request metrics:
usePower("aws-cost-optimization", "awslabs.cloudwatch-mcp-server", "get_metric_data", {
"metric_data_queries": [
{
"id": "invocations",
"metric_stat": {
"metric": {
"namespace": "AWS/Bedrock",
"metric_name": "Invocations",
"dimensions": [{"Name": "ModelId", "Value": "anthropic.claude-3-haiku-20240307-v1:0"}]
},
"period": 3600,
"stat": "Sum"
}
},
{
"id": "input_tokens",
"metric_stat": {
"metric": {
"namespace": "AWS/Bedrock",
"metric_name": "InputTokenCount",
"dimensions": [{"Name": "ModelId", "Value": "anthropic.claude-3-haiku-20240307-v1:0"}]
},
"period": 3600,
"stat": "Sum"
}
},
{
"id": "avg_tokens_per_request",
"expression": "input_tokens / invocations"
}
],
"start_time": "2024-11-01T00:00:00Z",
"end_time": "2024-12-01T00:00:00Z"
})
---
Optimization Strategies
1. Model Selection Optimization
Cost-Performance Model Comparison:
Ultra-Low Cost Models:
- **Amazon Nova Micro:** Input $0.000035/1K, Output $0.00014/1K - Best for simple tasks
- **Amazon Nova Lite:** Input $0.00006/1K, Output $0.00024/1K - Balanced cost-performance
Mid-Range Models:
- **Claude 3.5 Haiku:** Input $0.0008/1K, Output $0.004/1K - Fast, cost-effective
- **Claude Instant:** Input $0.0008/1K, Output $0.0024/1K - Good for most use cases
High-Performance Models:
- **Claude 3.5 Sonnet:** Input $0.003/1K, Output $0.015/1K - Advanced reasoning
- **Claude Opus:** Input $0.015/1K, Output $0.075/1K - Highest capability, highest cost
Model Selection Strategy:
// Compare model costs for your use case
// Example: 20M input tokens, 1M output tokens
// Nova Micro: $0.84 total
// Claude Opus: $375.00 total (446x more expensive)
Implementation Steps:
1. Start with cost-effective models and evaluate performance
2. Use model comparison in Bedrock Playground to test multiple models
3. Implement A/B testing to validate model performance vs cost
4. Consider model distillation for up to 75% cost reduction with <2% accuracy loss
2. Prompt Engineering Optimization
Token Optimization Principles:
- **Be concise and clear** - Avoid essay-like prompts
- **Use specific directives** instead of verbose explanations
- **Implement output limits** to control generation costs
- **Preprocess input data** to reduce token count
Real-World Optimization Examples:
Example 1: Text Generation
- **Bad approach:** Verbose prompting (113 input + 352 output tokens) = $9.35/1K runs
- **Good approach:** Concise prompting with output limits = $2.84/1K runs
- **Savings:** 70% cost reduction through prompt optimization
Example 2: Data Processing
- **Bad approach:** Raw HTML table processing (16,161 input tokens) = $130.36/1K runs
- **Better approach:** Pre-processed CSV input (1,709 input tokens) = $17.54/1K runs
- **Best approach:** Optimized CSV preprocessing (824 input tokens) = $10.84/1K runs
- **Savings:** 92% cost reduction through data preprocessing
Prompt Engineering Best Practices:
- Include context and examples for better results
- Use stop sequences to control output length
- Break complex tasks into smaller, focused prompts
- Experiment with different prompt structures in Bedrock Playground
3. Inference Pricing Plan Optimization
On-Demand Pricing:
- **Best for:** Prototyping, POCs, variable workloads
- **Characteristics:** Pay-per-token, no commitment, RPM/TPM limits
- **Cost structure:** Higher per-token cost but no fixed costs
Provisioned Throughput:
- **Best for:** Consistent production workloads, custom models
- **Characteristics:** Reserved capacity, guaranteed throughput, 1-6 month commitments
- **Cost structure:** Fixed hourly rate, discounted for longer commitments
Batch Inference:
- **Best for:** Large-scale offline processing, model evaluation
- **Characteristics:** Up to 50% savings vs on-demand, asynchronous processing
- **Cost structure:** Same token-based pricing but at 50% discount
Decision Matrix Example:
// Scenario: 1M chat sessions/month, 5 messages each, 3K input + 500 output tokens
// On-Demand: Variable cost based on actual usage
// Provisioned Throughput: $230K/month for guaranteed performance
// Batch: 50% savings for offline processing
4. Advanced Cost Optimization Features
Amazon Bedrock Prompt Caching:
- **Up to 90% cost reduction** for repetitive context
- **Up to 85% latency reduction** for cached prompts
- **Use cases:** RAG applications, long system prompts, repetitive context
Amazon Bedrock Intelligent Prompt Routing:
- **Up to 30% cost reduction** without accuracy loss
- **Automatic routing** to optimal models based on prompt characteristics
- **Single endpoint** for multiple model access
Model Distillation:
- **Up to 500% faster inference** with 75% cost reduction
- **Less than 2% accuracy loss** for most use cases
- **Teacher-student model approach** for cost-efficient deployment
Implementation:
// Monitor caching effectiveness
usePower("aws-cost-optimization", "awslabs.cloudwatch-mcp-server", "get_metric_statistics", {
"namespace": "AWS/Bedrock",
"metric_name": "CacheHitRate",
"start_time": "2024-11-01T00:00:00Z",
"end_time": "2024-12-01T00:00:00Z",
"period": 3600,
"statistics": ["Average"]
})
5. Guardrails Cost Management
Guardrails Pricing Structure:
- **Content filters:** $0.15 per 1K text units
- **Denied topics:** $0.15 per 1K text units
- **Contextual grounding:** $0.10 per 1K text units
- **PII filters:** $0.10 per 1K text units
- **Word filters:** Free
- **Regular expressions:** Free
Cost Optimization:
- Use free filters (word filters, regex) when possible
- Implement client-side filtering for simple cases
- Optimize text unit usage (1 text unit = 1,000 characters)
- Consider selective guardrail application based on use case
Example Cost Calculation:
// Chatbot: 500 requests/hour, 10 hours/day, 21 days/month
// Request: 1,500 chars (2 text units), Response: 400 chars (1 text unit)
// With content filters + denied topics + PII: $115.50/month
---
Common Cost Pitfalls & Solutions
Pitfall 1: Over-Prompting and Verbose Context
Problem Description:
- Including unnecessary context or verbose instructions
- Not preprocessing input data to reduce token count
- Using entire documents as context when summaries would suffice
Detection:
// Monitor average token usage per request
usePower("aws-cost-optimization", "awslabs.cloudwatch-mcp-server", "get_metric_statistics", {
"namespace": "AWS/Bedrock",
"metric_name": "InputTokenCount",
"start_time": "2024-11-01T00:00:00Z",
"end_time": "2024-12-01T00:00:00Z",
"period": 3600,
"statistics": ["Average", "Maximum"]
})
Solution:
- Implement prompt templates with optimized structure
- Use preprocessing to clean and summarize input data
- Set up monitoring for token usage patterns
- Regular prompt engineering reviews and optimization
Pitfall 2: Inappropriate Model Selection
Problem Description:
- Using high-cost models (Claude Opus) for simple tasks
- Not evaluating cost-performance trade-offs
- Defaulting to most capable model without testing alternatives
Detection & Solution:
- Use Bedrock Playground for model comparison
- Implement cost tracking by model type
- A/B test different models for your specific use cases
- Consider model distillation for production workloads
Pitfall 3: Inefficient Inference Pattern Selection
Problem Description:
- Using On-Demand for consistent high-volume workloads
- Not leveraging Batch Inference for offline processing
- Over-provisioning Provisioned Throughput capacity
Detection & Solution:
- Analyze usage patterns and traffic consistency
- Calculate break-even points for different pricing models
- Use CloudWatch metrics to optimize provisioned capacity
- Implement hybrid approaches for different workload types
---
Real-World Scenarios
Scenario 1: Customer Service Chatbot Optimization
Situation:
- Insurance company with customer service chatbot
- Processing 50 calls/minute during 8-hour workdays
- 25K input tokens + 1K output tokens per call
- Using Claude 3.5 Haiku for call summarization
Analysis Approach:
// Step 1: Analyze current Bedrock costs
usePower("aws-cost-optimization", "awslabs.billing-cost-management-mcp-server", "cost_explorer", {
"operation": "getCostAndUsage",
"start_date": "2024-11-01",
"end_date": "2024-12-01",
"granularity": "MONTHLY",
"group_by": "[{\"Type\": \"DIMENSION\", \"Key\": \"USAGE_TYPE\"}]",
"metrics": "[\"UnblendedCost\"]",
"filters": "{\"Dimensions\": {\"Key\": \"SERVICE\", \"Values\": [\"Amazon Bedrock\"]}}"
})
// Step 2: Monitor token usage patterns
usePower("aws-cost-optimization", "awslabs.cloudwatch-mcp-server", "get_metric_statistics", {
"namespace": "AWS/Bedrock",
"metric_name": "InputTokenCount",
"start_time": "2024-11-01T00:00:00Z",
"end_time": "2024-12-01T00:00:00Z",
"period": 3600,
"statistics": ["Sum", "Average"]
})
Solution Implementation:
- **Model optimization:** Tested Nova Lite for 95% cost reduction with acceptable quality
- **Prompt caching:** Implemented for system prompts, reducing costs by 85%
- **Batch processing:** Moved non-urgent summaries to batch inference for 50% savings
- **Preprocessing:** Cleaned call transcripts to reduce average input tokens by 40%
Results:
- **Monthly cost:** $17,280 → $3,456 (80% reduction)
- **Maintained quality:** Customer satisfaction scores unchanged
- **Improved performance:** 85% latency reduction through caching
Scenario 2: Content Generation Platform
Situation:
- Media company generating articles and summaries
- Variable workload with peak periods during news events
- Multiple content types requiring different model capabilities
- High costs from using Claude Opus for all tasks
Analysis Approach:
// Analyze usage patterns by content type
usePower("aws-cost-optimization", "awslabs.billing-cost-management-mcp-server", "cost_explorer", {
"operation": "getCostAndUsage",
"start_date": "2024-11-01",
"end_date": "2024-12-01",
"granularity": "HOURLY",
"group_by": "[{\"Type\": \"TAG\", \"Key\": \"ContentType\"}]",
"metrics": "[\"UnblendedCost\"]",
"filters": "{\"Dimensions\": {\"Key\": \"SERVICE\", \"Values\": [\"Amazon Bedrock\"]}}"
})
Solution Implementation:
- **Intelligent routing:** Implemented automatic model selection based on content complexity
- **Model tiering:** Nova Lite for simple summaries, Claude Haiku for articles, Sonnet for analysis
- **Batch processing:** Moved non-urgent content generation to batch inference
- **Prompt optimization:** Reduced average prompt length by 60% through better templates
Results:
- **65% cost reduction** through intelligent model routing
- **Improved throughput** handling peak loads more efficiently
- **Better resource utilization** matching model capability to task complexity
---
Integration with Other Services
Cost Impact of Service Integrations
Common Integration Patterns:
- **Bedrock + S3:** Document storage and retrieval for RAG applications
- **Bedrock + Lambda:** Serverless AI processing with automatic scaling
- **Bedrock + API Gateway:** RESTful APIs for AI services
- **Bedrock + Knowledge Bases:** Vector search and retrieval augmentation
Cross-Service Optimization:
- **Regional co-location:** Minimize data transfer costs between services
- **S3 lifecycle policies:** Optimize storage costs for training data and embeddings
- **Lambda optimization:** Right-size functions for AI workload processing
Analysis Commands:
// Analyze cross-service AI costs
usePower("aws-cost-optimization", "awslabs.billing-cost-management-mcp-server", "cost_explorer", {
"operation": "getCostAndUsage",
"start_date": "2024-11-01",
"end_date": "2024-12-01",
"granularity": "MONTHLY",
"group_by": "[{\"Type\": \"DIMENSION\", \"Key\": \"SERVICE\"}]",
"metrics": "[\"UnblendedCost\"]",
"filters": "{\"Dimensions\": {\"Key\": \"SERVICE\", \"Values\": [\"Amazon Bedrock\", \"AWS Lambda\", \"Amazon Simple Storage Service\", \"Amazon API Gateway\"]}}"
})
---
Monitoring & Alerting
Key Metrics to Monitor
Cost Metrics:
- Daily AI spend by model and application
- Cost per request and cost per token metrics
- Token utilization efficiency ratios
Usage Metrics:
- Input/output token counts by model and use case
- Request rates and response latencies
- Cache hit rates for prompt caching
Operational Metrics (via CloudWatch):
- Model invocation success rates and error patterns
- Guardrails activation rates and policy effectiveness
- Provisioned throughput utilization rates
Recommended Alerts
Budget Alerts:
// Monitor Bedrock-specific budget performance
usePower("aws-cost-optimization", "awslabs.billing-cost-management-mcp-server", "budgets", {
"filters": "{\"Dimensions\": {\"Key\": \"SERVICE\", \"Values\": [\"Amazon Bedrock\"]}}"
})
Anomaly Detection:
// Set up anomaly monitoring for Bedrock services
usePower("aws-cost-optimization", "awslabs.billing-cost-management-mcp-server", "cost_anomaly", {
"start_date": "2024-11-01",
"end_date": "2024-12-01",
"filters": "{\"Dimensions\": {\"Key\": \"SERVICE\", \"Values\": [\"Amazon Bedrock\"]}}"
})
Token Usage Alerts:
// Monitor token usage patterns
usePower("aws-cost-optimization", "awslabs.cloudwatch-mcp-server", "describe_alarms", {
"alarm_name_prefix": "Bedrock-TokenUsage",
"state_value": "ALARM"
})
Cost Allocation and Tracking
Inference Profiles:
- Create logical groupings for different applications/teams
- Enable granular cost tracking across departments
- Separate billing for different use cases or tenants
API-Level Tracking:
- InvokeModel APIs return requestID, modelID, inputTokenCount, outputTokenCount
- Build custom metering systems using token count data
- Implement real-time cost tracking and alerting
---
Best Practices Summary
✅ Do:
- **Start with cost-effective models** - Test Nova Lite/Micro before upgrading to premium models
- **Optimize prompts aggressively** - Preprocess data and use concise, clear instructions
- **Implement prompt caching** - Up to 90% savings for repetitive context
- **Use appropriate inference plans** - Match pricing model to usage patterns
- **Monitor token usage closely** - Track cost per request and optimize continuously
❌ Don't:
- **Over-prompt with verbose context** - Be concise and preprocess input data
- **Use premium models for simple tasks** - Match model capability to task complexity
- **Ignore batch inference opportunities** - Use for offline processing with 50% savings
- **Forget about model distillation** - Consider for production workloads needing cost efficiency
- **Neglect guardrails optimization** - Use free filters when possible
🔄 Regular Review Cycle:
- **Daily:** Monitor token usage and cost per request metrics
- **Weekly:** Review model performance vs cost trade-offs
- **Monthly:** Analyze usage patterns and optimize inference plans
- **Quarterly:** Evaluate new models and features for cost optimization opportunities
---
Additional Resources
AWS Documentation
- [Amazon Bedrock Pricing](https://aws.amazon.com/bedrock/pricing/)
- [Bedrock User Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/)
- [Prompt Engineering Guidelines](https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-engineering-guidelines.html)
Tools & Calculators
- [Bedrock Playground](https://console.aws.amazon.com/bedrock/) for model comparison and testing
- [AWS Pricing Calculator](https://calculator.aws/) for Bedrock cost modeling
- [Token Counter Tools](https://platform.openai.com/tokenizer) for prompt optimization
Related Power Guidance
- SageMaker Cost Optimization for traditional ML workloads
- Lambda Cost Optimization for serverless AI processing
- S3 Cost Optimization for AI data storage and retrieval
---
Service Code: AmazonBedrock
Last Updated: January 2026
Review Cycle: Quarterly