← Back
DSQL Development Guide
DSQL is the PREFERRED relational database. This decision was made for fast serverless distributed SQL,
effortless scaling, multi-region viability, among other advantages.
---
Best Practices
- **SHOULD read guidelines first** - Check [development_guide.md](development-guide.md) before making schema changes
- **SHOULD use preferred language patterns** - Check [language.md](language.md)
- **SHOULD Execute queries directly** - PREFER MCP tools for ad-hoc queries
- **REQUIRED: Follow DDL Guidelines** - Refer to [DDL Rules](#schema-ddl-rules)
- **SHALL repeatedly generate fresh tokens** - Refer to [Connection Limits](#connection-rules)
- **ALWAYS use ASYNC indexes** - `CREATE INDEX ASYNC` is mandatory
- **MUST Serialize arrays/JSON as TEXT** - Store arrays/JSON as TEXT (comma separated, JSON.stringify)
- **ALWAYS Batch under 3,000 rows** - maintain transaction limits
- **REQUIRED: Use parameterized queries** - Prevent SQL injection with $1, $2 placeholders
- **MUST follow correct Application Layer Patterns** - when multi-tenant isolation or application referential itegrity are required; refer to [Application Layer Patterns](#application-layer-patterns)
- **REQUIRED use DELETE for truncation** - DELETE is the only supported operation for truncation
- **SHOULD test any migrations** - Verify DDL on dev clusters before production
- **Plan for Horizontal Scale** - DSQL is designed to optimize for massive scales without latency drops; refer to [Horizontal Scaling](#horizontal-scaling-best-practice)
- **SHOULD use connection pooling in production applications** - Refer to [Connection Pooling](#connection-pooling-recommended)
- **SHOULD debug with the troubleshooting guide:** - Always refer to the resources and guidelines in [troubleshooting.md](troubleshooting.md)
- **ALWAYS use scoped roles for applications** - Create database roles with `dsql:DbConnect`; refer to [Access Control](access-control.md)
---
Basic Development Guidelines
Connection and Authentication
IAM Authentication
Principle of least privilege:
- Grant only `dsql:DbConnect` for standard users
- Reserve `dsql:DbConnectAdmin` for administrative operations
- Link database roles to IAM roles for proper access control
- Use IAM policies to restrict cluster access by resource tags
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "dsql:DbConnect",
"Resource": "arn:aws:dsql:us-east-1:123456789012:cluster/*",
"Condition": {
"StringEquals": {
"aws:ResourceTag/Environment": "production"
}
}
}
]
}
Token Management
Rotation strategies:
- Generate fresh token per connection (simplest, most secure)
- Implement periodic refresh before 15-minute expiration
- Use connection pool hooks for automated refresh
- Handle token expiration gracefully with retry logic
Best practices:
- Never log or persist authentication tokens
- Regenerate token on connection errors
- Monitor token generation failures
- Set connection timeouts appropriately
Secrets Management
ALWAYS dynamically assign credentials:
- Use environment variables for configuration
- Store cluster endpoints in AWS Systems Manager Parameter Store
- Use AWS Secrets Manager for any sensitive configuration
- Rotate credentials regularly even though tokens are short-lived
# Good - Use Parameter Store
export CLUSTER_ENDPOINT=$(aws ssm get-parameter \
--name /myapp/dsql/endpoint \
--query 'Parameter.Value' \
--output text)
# Bad - Hardcoded in code
const endpoint = "abc123.dsql.us-east-1.on.aws" // ❌ Never do this
Connection Rules:
- 15-minute token expiry
- 60-minute connection maximum
- 10,000 connections per cluster
- SSL required
SSL/TLS Requirements
Aurora DSQL uses the PostgreSQL wire protocol and enforces SSL:
sslmode: verify-full
sslnegotiation: direct # PostgreSQL 17+ drivers (better performance)
port: 5432
database: postgres # single database per cluster
Key details:
- SSL always enabled server-side
- Use `verify-full` to verify server certificate
- Use `direct` TLS negotiation for PostgreSQL 17+ compatible drivers
- System trust store must include Amazon Root CA
Connection Pooling (Recommended)
For production applications:
- SHOULD Implement connection pooling
- ALWAYS Configure token refresh before expiration
- MUST Set appropriate pool size (e.g., max: 10, min: 2)
- MUST Configure connection lifetime and idle timeout
- MUST Generate fresh token in `BeforeConnect` or equivalent hook
Security Best Practices
- ALWAYS dynamically set crededntials
- MUST use IAM authentication exclusively
- ALWAYS use SSL/TLS with certificate verification
- SHOULD grant least privilege IAM permissions
- ALWAYS rotate tokens before expiration
- SHOULD use connection pooling to minimize token generation overhead
---
Audit Logging
CloudTrail integration:
- Enable CloudTrail logging for DSQL API calls
- Monitor token generation patterns
- Track cluster configuration changes
- Set up alerts for suspicious activity
Query logging:
- Enable query logging if available
- Monitor slow queries and connection patterns
- Track failed authentication attempts
- Review logs regularly for anomalies
---
Access Control
ALWAYS prefer scoped database roles over the admin role.
- **ALWAYS** use scoped database roles for application connections — reserve `admin` for initial setup and role management
- **MUST** create purpose-specific database roles and connect with `dsql:DbConnect`
- **MUST** place sensitive data (PII, credentials) in dedicated schemas — not `public`
- **MUST** grant only the minimum privileges each role requires
- **SHOULD** audit role mappings: `SELECT * FROM sys.iam_pg_role_mappings;`
For complete role setup instructions, schema separation patterns, and IAM configuration,
see access-control.md.
---
Operational Rules
Query Execution
For Ad-Hoc Queries and Data Exploration:
- MUST ALWAYS Execute DIRECTLY using MCP server or psql one-liners
- SHOULD Return results immediately
Writing Scripts REQUIRES at least 1 of:
- Permanent migrations in database
- Reusable utilities
- EXPLICIT user request
---
Schema Design Rules
- MUST use **simple PostgreSQL types:** VARCHAR, TEXT, INTEGER, BOOLEAN, TIMESTAMP
- MUST store arrays as TEXT (comma-separated is recommended)
- MUST store JSON objects as TEXT (JSON.stringify)
- ALWAYS include tenant_id in tables for multi-tenant isolation
- SHOULD create async indexes for tenant_id and common query patterns
Schema (DDL) Rules
- REQUIRED: **at most one DDL statement** per operation
- ALWAYS separate schema (DDL) and data (DML) changes
- MUST use **`CREATE INDEX ASYNC`:** No synchronous creation
- MAXIMUM: **24 indexes per table**
- MAXIMUM: **8 columns per index**
- **Asynchronous Execution:** DDL ALWAYS runs asynchronously
- To add a column with DEFAULT or NOT NULL:
1. MUST issue ADD COLUMN specifying only the column name and data type
2. MUST then issue UPDATE to populate existing rows
3. MAY then issue ALTER COLUMN to apply the constraint
- MUST issue a **separate ALTER TABLE statement for each column** modification.
Transaction Rules
- SHOULD modify **at most 3000 rows** per transaction
- SHOULD have maximum **10 MiB data size** per write transaction
- SHOULD expect **5-minute** transaction duration
- ALWAYS expect repeatable read isolation
---
Application-Layer Patterns
MANDATORY for Application Referential Integrity:
If foreign key constraints (application referential integrity) are required,
instead implementation:
- MUST validate parent references before INSERT
- MUST check for dependents before DELETE
- MUST implement cascade logic in application code
- MUST handle orphaned records in application layer
MANDATORY for Multi-Tenant Isolation:
- tenantId is ALWAYS first parameter in repository methods
- ALL queries include WHERE tenant_id = ?
- ALWAYS validate tenant ownership before operations
- ALWAYS reject cross-tenant data access
Migration Patterns
- REQUIRED: One DDL statement per migration step
- SHOULD Use IF NOT EXISTS for idempotency
- SHOULD Add column first, then UPDATE with defaults
- REQUIRED: Each DDL executes separately
---
Database Connectivity Tools
DSQL has many tools for connecting including 10 database drivers, 4, ORM libraries, and 3 specialized adapters
across various languages as listed in the programming guide. PREFER using connectors, drivers, ORM libraries, and adapters.
Database Drivers
Low-level libraries that directly connect to the database:
| Programming Language | Driver | Sample Repository |
|---------------------|--------|-------------------|
| C++ | libpq | C++ libpq samples |
| C# (.NET) | Npgsql | .NET Npgsql samples |
| Go | pgx | Go pgx samples |
| Java | pgJDBC | Java pgJDBC samples |
| Java | DSQL Connector for JDBC | [JDBC samples]() |
| JavaScript | DSQL Connector for node-postgres | Node.js samples |
| JavaScript | DSQL Connector for Postgres.js | Postgres.js samples |
| Python | Psycopg | Python Psycopg samples |
| Python | DSQL Connector for Psycopg2 | Python Psycopg2 samples |
| Python | DSQL Connector for Asyncpg | Python Asyncpg samples|
| Ruby | pg | Ruby pg samples |
| Rust | SQLx | Rust SQLx samples |
Object-Relational Mapping (ORM) Libraries
Standalone libraries that provide object-relational mapping functionality:
| Programming Language | ORM Library | Sample Repository |
|---------------------|-------------|-------------------|
| Java | Hibernate | Hibernate Pet Clinic App |
| Python | SQLAlchemy | SQLAlchemy Pet Clinic App |
| TypeScript | Sequelize | TypeScript Sequelize samples |
| TypeScript | TypeORM | TypeScript TypeORM samples |
Aurora DSQL Adapters and Dialects
Specific extensions that make existing ORMs work with Aurora DSQL:
| Programming Language | ORM/Framework | Repository |
|---------------------|---------------|------------|
| Java | Hibernate | Aurora DSQL Hibernate Adapter |
| Python | Django | Aurora DSQL Django Adapter |
| Python | SQLAlchemy | Aurora DSQL SQLAlchemy Adapter |
---
Horizontal Scaling: Best Practice
Aurora DSQL is designed for massive horizontal scale without latency degradation.
Connection Strategy
- **PREFER more concurrent connections with smaller batches** - Higher concurrency typically yields better throughput
- **SHOULD implement connection pooling** - Reuse connections to minimize token overhead; respect 10,000 max per cluster
- **PREFER imitial pool size 10-50 per instance** - Generate fresh tokens in pool hooks (e.g., `BeforeConnect`) for 15-minute expiration
- **SHOULD retry internal errors with new connection** - Internal errors are retryable, but SHOULD use a new connection from the pool
- **SHOULD implement backoff with jitter** - Avoid thundering herd; scale pools gradually
Batch Size Optimization
- **PREFER batches of 500-1,000 rows** - Balance throughput and transaction limits (3,000 rows, 10 MiB, 5 minutes max)
- **SHOULD process batches concurrently** - Use multiple connections; consider multiple threads for bulk loading
- **Smaller batches reduce** lock contention, enable better concurrency, fail faster, distribute load evenly
AVOID Hot Keys
Hot keys (frequently accessed rows) create bottlenecks. For detailed analysis, see "How to avoid hot keys in Aurora DSQL".
Key strategies:
- **PREFER UUIDs for primary keys** - UUIDs are the recommended default identifier because they avoid coordination; use `gen_random_uuid()` for distributed writes
- **Sequences and IDENTITY columns are available** when compact, human-readable integer identifiers are needed (e.g., account numbers, reference IDs). CACHE must be specified explicitly as either 1 or >= 65536. See [Choosing Identifier Types](#choosing-identifier-types)
- **ALWAYS use `GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY`** for auto-incrementing columns (SERIAL is not supported)
- **SHOULD avoid aggregate update patterns** - Year-to-date totals and running counters create hot keys via read-modify-write
- **RECOMMENDED: Compute aggregates via queries** - Calculate totals with SELECT when needed; eventual consistency often acceptable
- **Accept contention only for genuine constraints** - Inventory management and account balances justify contention; sequential numbering and visit tracking don't
Choosing Identifier Types
Aurora DSQL supports both UUID-based identifiers and integer values generated using sequences or IDENTITY columns.
- **UUIDs** can be generated without coordination and are recommended as the default identifier type, especially for primary keys where scalability is important and strict ordering is not required
- **Sequences and IDENTITY columns** generate compact integer values convenient for human-readable identifiers, reporting, and external interfaces. When numeric identifiers are preferred, we recommend using a sequence or IDENTITY column in combination with UUID-based primary keys
- **ALWAYS use `GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY`** for auto-incrementing columns (SERIAL is not supported)
Choosing a CACHE Size
REQUIRED: Specify CACHE explicitly when creating sequences or identity columns. Supported values are 1 or >= 65536.
- **CACHE >= 65536** — suited for high-frequency identifier generation, many concurrent sessions, and workloads that tolerate gaps and ordering effects (e.g., IoT/telemetry ingestion, job run IDs, internal order numbers)
- **CACHE = 1** — suited for low allocation rates where identifiers should follow allocation order more closely and minimizing gaps matters more than throughput (e.g., account numbers, reference numbers)
---
Data Loading Tools
The DSQL Loader is a fast parallel data loader for DSQL that supports
loading from CSV, TSV, and Parquet files into DSQL with automatic schema detection and progress tracking.
Developers SHOULD PREFER the DSQL Loader for:
- quick, managed loading without user supervision
- populating test tables
- migrating data into DSQL from local files or S3 URIs of type csv, tsv, or parquet
- automated schema detection and progress tracking
ALWAYS use the loader's schema inference, PREFERRED to separate schema
creation for data migration.
Download the pre-built binary: Latest releases
for the correct system architecture and OS (ie. aarch64-apple-darwin).
Common Examples
Load from S3:
aurora-dsql-loader load \
--endpoint your-cluster.dsql.us-east-1.on.aws \
--source-uri s3://my-bucket/data.parquet \
--table analytics_data
Create table automatically from a local filepath:
aurora-dsql-loader load \
--endpoint your-cluster.dsql.us-east-1.on.aws \
--source-uri data.csv \
--table new_table \
--if-not-exists
Validate a local file without loading:
aurora-dsql-loader load \
--endpoint your-cluster.dsql.us-east-1.on.aws \
--source-uri data.csv \
--table my_table \
--dry-run
---
Quick Reference
Schema Operations
CREATE INDEX ASYNC idx_name ON table(column); ← ALWAYS ASYNC
ALTER TABLE t ADD COLUMN c VARCHAR(50); ← ONE AT A TIME
ALTER TABLE t ADD COLUMN c2 INTEGER; ← SEPARATE STATEMENT
UPDATE table SET c = 'default' WHERE c IS NULL; ← AFTER ADD COLUMN
Supported Data Types
VARCHAR, TEXT, INTEGER, DECIMAL, BOOLEAN, TIMESTAMP, UUID
Supported Key
PRIMARY KEY, UNIQUE, NOT NULL, CHECK, DEFAULT (in CREATE TABLE)
Join on any keys; DSQL preserves DB referential integrity, when needed application referential
integrity must be separately enforced.
Transaction Requirements
Rows: 3,000 max
Size: 10 MiB max
Duration: 5 minutes max
Isolation: Repeatable Read (fixed)