Amazon Bedrock Cost Optimization Guide

Service Overview

What is Amazon Bedrock?

Fully managed service for building and scaling generative AI applications
Access to foundation models (FMs) from leading AI providers (Anthropic, Meta, Mistral AI, Amazon)
Multiple inference options: On-Demand, Provisioned Throughput, and Batch Inference
Built-in capabilities: model customization, agents, guardrails, and knowledge bases
Pay-per-token pricing model based on input and output tokens processed

Why Cost Optimization Matters

Generative AI workloads can represent 40-60% of AI/ML costs in organizations
Token-based pricing can lead to unexpected costs without proper optimization
Multiple model options with vastly different pricing (Nova Micro: $0.000035/1K vs Claude Opus: $0.075/1K output)
Common cost surprises include over-prompting, inappropriate model selection, and inefficient inference patterns

---

Cost Analysis & Monitoring

Key Cost Metrics to Track

Primary Cost Drivers:

**Input Tokens** - Text provided to the model ($0.000035-$0.015 per 1K tokens)
**Output Tokens** - Text generated by the model ($0.00014-$0.075 per 1K tokens)
**Model Selection** - Different models have vastly different pricing structures
**Provisioned Throughput** - Reserved capacity for consistent workloads ($35-$230+ per hour)
**Guardrails** - Content filtering and safety features ($0.10-$0.15 per 1K text units)

Token Economics:

**Rule of thumb:** 750 words ≈ 1,000 tokens (multiply words by 1.3)
**Cost range:** $0.000035 (Nova Lite input) to $0.075 (Claude Opus output) per 1K tokens
**Individual requests:** Nearly negligible cost ($0.000015 typical)

Cost Allocation Tags:

Application/UseCase for cost attribution to specific AI initiatives
Environment (dev, staging, prod) for lifecycle management
Team/Department for organizational cost tracking
ModelType (chat, summarization, analysis) for workload categorization

Using the Power's Tools

Get Bedrock costs by model:


usePower("aws-cost-optimization", "awslabs.billing-cost-management-mcp-server", "cost_explorer", {
  "operation": "getCostAndUsage",
  "start_date": "2024-11-01",
  "end_date": "2024-12-01",
  "granularity": "MONTHLY",
  "group_by": "[{\"Type\": \"DIMENSION\", \"Key\": \"SERVICE\"}]",
  "metrics": "[\"UnblendedCost\"]",
  "filters": "{\"Dimensions\": {\"Key\": \"SERVICE\", \"Values\": [\"Amazon Bedrock\"]}}"
})

Analyze Bedrock usage patterns by account:


usePower("aws-cost-optimization", "awslabs.billing-cost-management-mcp-server", "cost_explorer", {
  "operation": "getCostAndUsage",
  "start_date": "2024-11-01",
  "end_date": "2024-12-01",
  "granularity": "DAILY",
  "group_by": "[{\"Type\": \"DIMENSION\", \"Key\": \"LINKED_ACCOUNT\"}]",
  "metrics": "[\"UsageQuantity\", \"UnblendedCost\"]",
  "filters": "{\"Dimensions\": {\"Key\": \"SERVICE\", \"Values\": [\"Amazon Bedrock\"]}}"
})

Get Bedrock pricing information:


usePower("aws-cost-optimization", "awslabs.aws-pricing-mcp-server", "get_pricing", {
  "service_code": "AmazonBedrock",
  "region": ["us-east-1", "us-west-2"],
  "filters": [
    {"Field": "productFamily", "Value": "Machine Learning", "Type": "EQUALS"}
  ]
})

Monitor Bedrock token utilization:


usePower("aws-cost-optimization", "awslabs.cloudwatch-mcp-server", "get_metric_statistics", {
  "namespace": "AWS/Bedrock",
  "metric_name": "InputTokenCount",
  "dimensions": [{"Name": "ModelId", "Value": "anthropic.claude-3-haiku-20240307-v1:0"}],
  "start_time": "2024-11-01T00:00:00Z",
  "end_time": "2024-12-01T00:00:00Z",
  "period": 3600,
  "statistics": ["Sum", "Average"]
})

Create cost-per-request metrics:


usePower("aws-cost-optimization", "awslabs.cloudwatch-mcp-server", "get_metric_data", {
  "metric_data_queries": [
    {
      "id": "invocations",
      "metric_stat": {
        "metric": {
          "namespace": "AWS/Bedrock",
          "metric_name": "Invocations",
          "dimensions": [{"Name": "ModelId", "Value": "anthropic.claude-3-haiku-20240307-v1:0"}]
        },
        "period": 3600,
        "stat": "Sum"
      }
    },
    {
      "id": "input_tokens",
      "metric_stat": {
        "metric": {
          "namespace": "AWS/Bedrock",
          "metric_name": "InputTokenCount",
          "dimensions": [{"Name": "ModelId", "Value": "anthropic.claude-3-haiku-20240307-v1:0"}]
        },
        "period": 3600,
        "stat": "Sum"
      }
    },
    {
      "id": "avg_tokens_per_request",
      "expression": "input_tokens / invocations"
    }
  ],
  "start_time": "2024-11-01T00:00:00Z",
  "end_time": "2024-12-01T00:00:00Z"
})

---

Optimization Strategies

1. Model Selection Optimization

Cost-Performance Model Comparison:

Ultra-Low Cost Models:

**Amazon Nova Micro:** Input $0.000035/1K, Output $0.00014/1K - Best for simple tasks
**Amazon Nova Lite:** Input $0.00006/1K, Output $0.00024/1K - Balanced cost-performance

Mid-Range Models:

**Claude 3.5 Haiku:** Input $0.0008/1K, Output $0.004/1K - Fast, cost-effective
**Claude Instant:** Input $0.0008/1K, Output $0.0024/1K - Good for most use cases

High-Performance Models:

**Claude 3.5 Sonnet:** Input $0.003/1K, Output $0.015/1K - Advanced reasoning
**Claude Opus:** Input $0.015/1K, Output $0.075/1K - Highest capability, highest cost

Model Selection Strategy:


// Compare model costs for your use case
// Example: 20M input tokens, 1M output tokens
// Nova Micro: $0.84 total
// Claude Opus: $375.00 total (446x more expensive)

Implementation Steps:

1. Start with cost-effective models and evaluate performance

2. Use model comparison in Bedrock Playground to test multiple models

3. Implement A/B testing to validate model performance vs cost

4. Consider model distillation for up to 75% cost reduction with <2% accuracy loss

2. Prompt Engineering Optimization

Token Optimization Principles:

**Be concise and clear** - Avoid essay-like prompts
**Use specific directives** instead of verbose explanations
**Implement output limits** to control generation costs
**Preprocess input data** to reduce token count

Real-World Optimization Examples:

Example 1: Text Generation

**Bad approach:** Verbose prompting (113 input + 352 output tokens) = $9.35/1K runs
**Good approach:** Concise prompting with output limits = $2.84/1K runs
**Savings:** 70% cost reduction through prompt optimization

Example 2: Data Processing

**Bad approach:** Raw HTML table processing (16,161 input tokens) = $130.36/1K runs
**Better approach:** Pre-processed CSV input (1,709 input tokens) = $17.54/1K runs
**Best approach:** Optimized CSV preprocessing (824 input tokens) = $10.84/1K runs
**Savings:** 92% cost reduction through data preprocessing

Prompt Engineering Best Practices:

Include context and examples for better results
Use stop sequences to control output length
Break complex tasks into smaller, focused prompts
Experiment with different prompt structures in Bedrock Playground

3. Inference Pricing Plan Optimization

On-Demand Pricing:

**Best for:** Prototyping, POCs, variable workloads
**Characteristics:** Pay-per-token, no commitment, RPM/TPM limits
**Cost structure:** Higher per-token cost but no fixed costs

Provisioned Throughput:

**Best for:** Consistent production workloads, custom models
**Characteristics:** Reserved capacity, guaranteed throughput, 1-6 month commitments
**Cost structure:** Fixed hourly rate, discounted for longer commitments

Batch Inference:

**Best for:** Large-scale offline processing, model evaluation
**Characteristics:** Up to 50% savings vs on-demand, asynchronous processing
**Cost structure:** Same token-based pricing but at 50% discount

Decision Matrix Example:


// Scenario: 1M chat sessions/month, 5 messages each, 3K input + 500 output tokens
// On-Demand: Variable cost based on actual usage
// Provisioned Throughput: $230K/month for guaranteed performance
// Batch: 50% savings for offline processing

4. Advanced Cost Optimization Features

Amazon Bedrock Prompt Caching:

**Up to 90% cost reduction** for repetitive context
**Up to 85% latency reduction** for cached prompts
**Use cases:** RAG applications, long system prompts, repetitive context

Amazon Bedrock Intelligent Prompt Routing:

**Up to 30% cost reduction** without accuracy loss
**Automatic routing** to optimal models based on prompt characteristics
**Single endpoint** for multiple model access

Model Distillation:

**Up to 500% faster inference** with 75% cost reduction
**Less than 2% accuracy loss** for most use cases
**Teacher-student model approach** for cost-efficient deployment

Implementation:


// Monitor caching effectiveness
usePower("aws-cost-optimization", "awslabs.cloudwatch-mcp-server", "get_metric_statistics", {
  "namespace": "AWS/Bedrock",
  "metric_name": "CacheHitRate",
  "start_time": "2024-11-01T00:00:00Z",
  "end_time": "2024-12-01T00:00:00Z",
  "period": 3600,
  "statistics": ["Average"]
})

5. Guardrails Cost Management

Guardrails Pricing Structure:

**Content filters:** $0.15 per 1K text units
**Denied topics:** $0.15 per 1K text units
**Contextual grounding:** $0.10 per 1K text units
**PII filters:** $0.10 per 1K text units
**Word filters:** Free
**Regular expressions:** Free

Cost Optimization:

Use free filters (word filters, regex) when possible
Implement client-side filtering for simple cases
Optimize text unit usage (1 text unit = 1,000 characters)
Consider selective guardrail application based on use case

Example Cost Calculation:


// Chatbot: 500 requests/hour, 10 hours/day, 21 days/month
// Request: 1,500 chars (2 text units), Response: 400 chars (1 text unit)
// With content filters + denied topics + PII: $115.50/month

---

Common Cost Pitfalls & Solutions

Pitfall 1: Over-Prompting and Verbose Context

Problem Description:

Including unnecessary context or verbose instructions
Not preprocessing input data to reduce token count
Using entire documents as context when summaries would suffice

Detection:


// Monitor average token usage per request
usePower("aws-cost-optimization", "awslabs.cloudwatch-mcp-server", "get_metric_statistics", {
  "namespace": "AWS/Bedrock",
  "metric_name": "InputTokenCount",
  "start_time": "2024-11-01T00:00:00Z",
  "end_time": "2024-12-01T00:00:00Z",
  "period": 3600,
  "statistics": ["Average", "Maximum"]
})

Solution:

Implement prompt templates with optimized structure
Use preprocessing to clean and summarize input data
Set up monitoring for token usage patterns
Regular prompt engineering reviews and optimization

Pitfall 2: Inappropriate Model Selection

Problem Description:

Using high-cost models (Claude Opus) for simple tasks
Not evaluating cost-performance trade-offs
Defaulting to most capable model without testing alternatives

Detection & Solution:

Use Bedrock Playground for model comparison
Implement cost tracking by model type
A/B test different models for your specific use cases
Consider model distillation for production workloads

Pitfall 3: Inefficient Inference Pattern Selection

Problem Description:

Using On-Demand for consistent high-volume workloads
Not leveraging Batch Inference for offline processing
Over-provisioning Provisioned Throughput capacity

Detection & Solution:

Analyze usage patterns and traffic consistency
Calculate break-even points for different pricing models
Use CloudWatch metrics to optimize provisioned capacity
Implement hybrid approaches for different workload types

---

Real-World Scenarios

Scenario 1: Customer Service Chatbot Optimization

Situation:

Insurance company with customer service chatbot
Processing 50 calls/minute during 8-hour workdays
25K input tokens + 1K output tokens per call
Using Claude 3.5 Haiku for call summarization

Analysis Approach:


// Step 1: Analyze current Bedrock costs
usePower("aws-cost-optimization", "awslabs.billing-cost-management-mcp-server", "cost_explorer", {
  "operation": "getCostAndUsage",
  "start_date": "2024-11-01",
  "end_date": "2024-12-01",
  "granularity": "MONTHLY",
  "group_by": "[{\"Type\": \"DIMENSION\", \"Key\": \"USAGE_TYPE\"}]",
  "metrics": "[\"UnblendedCost\"]",
  "filters": "{\"Dimensions\": {\"Key\": \"SERVICE\", \"Values\": [\"Amazon Bedrock\"]}}"
})

// Step 2: Monitor token usage patterns
usePower("aws-cost-optimization", "awslabs.cloudwatch-mcp-server", "get_metric_statistics", {
  "namespace": "AWS/Bedrock",
  "metric_name": "InputTokenCount",
  "start_time": "2024-11-01T00:00:00Z",
  "end_time": "2024-12-01T00:00:00Z",
  "period": 3600,
  "statistics": ["Sum", "Average"]
})

Solution Implementation:

**Model optimization:** Tested Nova Lite for 95% cost reduction with acceptable quality
**Prompt caching:** Implemented for system prompts, reducing costs by 85%
**Batch processing:** Moved non-urgent summaries to batch inference for 50% savings
**Preprocessing:** Cleaned call transcripts to reduce average input tokens by 40%

Results:

**Monthly cost:** $17,280 → $3,456 (80% reduction)
**Maintained quality:** Customer satisfaction scores unchanged
**Improved performance:** 85% latency reduction through caching

Scenario 2: Content Generation Platform

Situation:

Media company generating articles and summaries
Variable workload with peak periods during news events
Multiple content types requiring different model capabilities
High costs from using Claude Opus for all tasks

Analysis Approach:


// Analyze usage patterns by content type
usePower("aws-cost-optimization", "awslabs.billing-cost-management-mcp-server", "cost_explorer", {
  "operation": "getCostAndUsage",
  "start_date": "2024-11-01",
  "end_date": "2024-12-01",
  "granularity": "HOURLY",
  "group_by": "[{\"Type\": \"TAG\", \"Key\": \"ContentType\"}]",
  "metrics": "[\"UnblendedCost\"]",
  "filters": "{\"Dimensions\": {\"Key\": \"SERVICE\", \"Values\": [\"Amazon Bedrock\"]}}"
})

Solution Implementation:

**Intelligent routing:** Implemented automatic model selection based on content complexity
**Model tiering:** Nova Lite for simple summaries, Claude Haiku for articles, Sonnet for analysis
**Batch processing:** Moved non-urgent content generation to batch inference
**Prompt optimization:** Reduced average prompt length by 60% through better templates

Results:

**65% cost reduction** through intelligent model routing
**Improved throughput** handling peak loads more efficiently
**Better resource utilization** matching model capability to task complexity

---

Integration with Other Services

Cost Impact of Service Integrations

Common Integration Patterns:

**Bedrock + S3:** Document storage and retrieval for RAG applications
**Bedrock + Lambda:** Serverless AI processing with automatic scaling
**Bedrock + API Gateway:** RESTful APIs for AI services
**Bedrock + Knowledge Bases:** Vector search and retrieval augmentation

Cross-Service Optimization:

**Regional co-location:** Minimize data transfer costs between services
**S3 lifecycle policies:** Optimize storage costs for training data and embeddings
**Lambda optimization:** Right-size functions for AI workload processing

Analysis Commands:


// Analyze cross-service AI costs
usePower("aws-cost-optimization", "awslabs.billing-cost-management-mcp-server", "cost_explorer", {
  "operation": "getCostAndUsage",
  "start_date": "2024-11-01",
  "end_date": "2024-12-01",
  "granularity": "MONTHLY",
  "group_by": "[{\"Type\": \"DIMENSION\", \"Key\": \"SERVICE\"}]",
  "metrics": "[\"UnblendedCost\"]",
  "filters": "{\"Dimensions\": {\"Key\": \"SERVICE\", \"Values\": [\"Amazon Bedrock\", \"AWS Lambda\", \"Amazon Simple Storage Service\", \"Amazon API Gateway\"]}}"
})

---

Monitoring & Alerting

Key Metrics to Monitor

Cost Metrics:

Daily AI spend by model and application
Cost per request and cost per token metrics
Token utilization efficiency ratios

Usage Metrics:

Input/output token counts by model and use case
Request rates and response latencies
Cache hit rates for prompt caching

Operational Metrics (via CloudWatch):

Model invocation success rates and error patterns
Guardrails activation rates and policy effectiveness
Provisioned throughput utilization rates

Recommended Alerts

Budget Alerts:


// Monitor Bedrock-specific budget performance
usePower("aws-cost-optimization", "awslabs.billing-cost-management-mcp-server", "budgets", {
  "filters": "{\"Dimensions\": {\"Key\": \"SERVICE\", \"Values\": [\"Amazon Bedrock\"]}}"
})

Anomaly Detection:


// Set up anomaly monitoring for Bedrock services
usePower("aws-cost-optimization", "awslabs.billing-cost-management-mcp-server", "cost_anomaly", {
  "start_date": "2024-11-01",
  "end_date": "2024-12-01",
  "filters": "{\"Dimensions\": {\"Key\": \"SERVICE\", \"Values\": [\"Amazon Bedrock\"]}}"
})

Token Usage Alerts:


// Monitor token usage patterns
usePower("aws-cost-optimization", "awslabs.cloudwatch-mcp-server", "describe_alarms", {
  "alarm_name_prefix": "Bedrock-TokenUsage",
  "state_value": "ALARM"
})

Cost Allocation and Tracking

Inference Profiles:

Create logical groupings for different applications/teams
Enable granular cost tracking across departments
Separate billing for different use cases or tenants

API-Level Tracking:

InvokeModel APIs return requestID, modelID, inputTokenCount, outputTokenCount
Build custom metering systems using token count data
Implement real-time cost tracking and alerting

---

Best Practices Summary

✅ Do:

**Start with cost-effective models** - Test Nova Lite/Micro before upgrading to premium models
**Optimize prompts aggressively** - Preprocess data and use concise, clear instructions
**Implement prompt caching** - Up to 90% savings for repetitive context
**Use appropriate inference plans** - Match pricing model to usage patterns
**Monitor token usage closely** - Track cost per request and optimize continuously

❌ Don't:

**Over-prompt with verbose context** - Be concise and preprocess input data
**Use premium models for simple tasks** - Match model capability to task complexity
**Ignore batch inference opportunities** - Use for offline processing with 50% savings
**Forget about model distillation** - Consider for production workloads needing cost efficiency
**Neglect guardrails optimization** - Use free filters when possible

🔄 Regular Review Cycle:

**Daily:** Monitor token usage and cost per request metrics
**Weekly:** Review model performance vs cost trade-offs
**Monthly:** Analyze usage patterns and optimize inference plans
**Quarterly:** Evaluate new models and features for cost optimization opportunities

---

Additional Resources

AWS Documentation

[Amazon Bedrock Pricing](https://aws.amazon.com/bedrock/pricing/)
[Bedrock User Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/)
[Prompt Engineering Guidelines](https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-engineering-guidelines.html)

Tools & Calculators

[Bedrock Playground](https://console.aws.amazon.com/bedrock/) for model comparison and testing
[AWS Pricing Calculator](https://calculator.aws/) for Bedrock cost modeling
[Token Counter Tools](https://platform.openai.com/tokenizer) for prompt optimization

Related Power Guidance

SageMaker Cost Optimization for traditional ML workloads
Lambda Cost Optimization for serverless AI processing
S3 Cost Optimization for AI data storage and retrieval

---

Service Code: AmazonBedrock

Last Updated: January 2026

Review Cycle: Quarterly