Multicloud DB SDK - Developer Guide¶
Comprehensive reference for the Multicloud DB SDK's key management, CRUD operations, query DSL, and provider-specific storage behavior. For a quick-start tutorial and architecture overview, see the Getting Started guide. For the portable API surface and error mapping reference, see Compatibility.
Table of Contents¶
- Key & Partition Key
- The Key Model
- When to Use Each Key Form
- Provider Storage Mapping
- Partition Key Strategy Guide
- Why MulticloudDbKey Is an Explicit Parameter
- Resource Addressing
- Provider Physical Mapping
- Database-per-Tenant Pattern
- Provisioning Resources with provisionSchema()
- CRUD Semantics
- create - Insert Only
- read - Point Read
- update - Replace Existing
- upsert - Create or Replace
- delete - Idempotent Delete
- Document Field Injection
- Query DSL
- Portable Expressions
- Parameter Binding
- Portable Functions
- Pagination (Continuation Tokens)
- Full Scan (No Filter)
- Expression-Based Filtering
- Multi-Condition Queries
- Native Expression Passthrough
- Translation Pipeline
- Querying with Partition Keys
- Why Partition Keys Matter for Queries
- Combining Expression Filters with Key Design
- Partition-Scoped Queries with QueryRequest.partitionKey()
- Multi-Tenant Architecture Patterns
- Error Handling
- Result Set Control
- Checking Capabilities
- Limiting Results
- Ordering Results
- Combining with Native Expressions
- Document TTL (Time-to-Live)
- Prerequisite: Enable Container-Level TTL
- Writing with TTL
- Checking TTL Support
- Document Metadata
- Reading Metadata
- Provider Metadata Availability
- System Property Stripping
- Document Size Enforcement
- Provider Diagnostics
- Query Diagnostics
- Provider Field Availability
- Structured Logging
- Customizing the User-Agent
- Why Set a Suffix?
- Configuring the Suffix
- Resulting Header Format
- Validation Rules
Key & Partition Key¶
The Key Model¶
Every document stored through Multicloud DB is identified by a MulticloudDbKey - a portable
representation of the minimum key material needed to uniquely identify a record
across all three supported providers.
import com.multiclouddb.api.MulticloudDbKey;
// 1. Simple partition-key-only key
MulticloudDbKey simple = MulticloudDbKey.of("order-123");
// 2. Partition key + sort key
MulticloudDbKey partitioned = MulticloudDbKey.of("customer-456", "order-123");
// 3. Partition + sort + composite components (future extensibility)
MulticloudDbKey composite = MulticloudDbKey.of("customer-456", "order-123",
Map.of("region", "us-east", "year", "2025"));
The three fields:
| Field | Type | Required | Purpose |
|---|---|---|---|
partitionKey |
String |
Yes (must be non-empty) | Distribution/hash key that determines which physical partition stores the item |
sortKey |
String |
No (nullable) | Item identifier within a partition; maps to DynamoDB sort/range key, Cosmos DB id, or Spanner's second primary-key column |
components |
Map<String,String> |
No (empty by default) | Additional composite key dimensions (extensibility) |
When to Use Each Key Form¶
MulticloudDbKey.of(partitionKey) - Simple Key¶
Use when each document is independent and you don't need to group or co-locate related documents.
// A standalone configuration document
MulticloudDbKey configKey = MulticloudDbKey.of("app-settings");
client.upsert(address, configKey, settingsDoc);
Provider behavior:
- Cosmos DB: Uses partitionKey as both the document id and the partition key
(effectively a single-partition-per-document layout).
- DynamoDB: Uses partitionKey as the hash key only - no sort key is written.
- Spanner: Uses partitionKey as both the partitionKey column and the sortKey column.
Best for: Lookup tables, configuration documents, singleton records.
MulticloudDbKey.of(partitionKey, sortKey) - Partition Key + Sort Key¶
Use when documents naturally belong to a group and you want to co-locate them
for efficient querying. The partitionKey defines the co-location group;
the sortKey uniquely identifies the document within the partition.
// Positions within a portfolio - portfolioId is the partition key
MulticloudDbKey positionKey = MulticloudDbKey.of("portfolio-alpha", "position-001");
// Orders within a customer - customerId is the partition key
MulticloudDbKey orderKey = MulticloudDbKey.of("customer-456", "order-789");
// Events within a session - sessionId is the partition key
MulticloudDbKey eventKey = MulticloudDbKey.of("session-abc", "event-42");
Provider behavior:
- Cosmos DB: sortKey → document's id field; partitionKey → stored as the
partitionKey field and used as the Cosmos DB partition key value. Documents
sharing the same partitionKey value are co-located in the same logical partition.
- DynamoDB: partitionKey → hash key attribute ("partitionKey"); sortKey → sort key
attribute ("sortKey"). Together they form the composite primary key.
- Spanner: partitionKey → partitionKey column; sortKey → sortKey column.
Both form the composite primary key.
Best for: Parent-child relationships, multi-tenant data, event streams, any scenario where you frequently query "all items within group X."
MulticloudDbKey.of(pk, pk) - Partition Key Equals Sort Key¶
A common pattern where the document's sort key equals its partition key. Each document occupies its own partition. This is the simplest layout and is appropriate when you don't need to group documents.
// Each portfolio is its own partition
MulticloudDbKey portfolioKey = MulticloudDbKey.of("portfolio-alpha", "portfolio-alpha");
Provider behavior: Identical to MulticloudDbKey.of(partitionKey, sortKey) with an explicit
sort key value, but every document is in its own partition (no co-location).
Best for: Top-level entities (tenants, portfolios, users) where you access documents primarily by their ID rather than querying within a group.
Provider Storage Mapping¶
The MulticloudDbKey type is portable - you write MulticloudDbKey.of(partitionKey, sortKey) and each
provider maps it to its native key model. Here's exactly what happens in each:
Azure Cosmos DB¶
MulticloudDbKey.of("portfolio-alpha", "pos-1")
│ │
▼ ▼
┌──────────────────────────────┐
│ Cosmos DB Container │
│ │
│ Document JSON: │
│ { │
│ "id": "pos-1", │ ← key.sortKey()
│ "partitionKey": "portfolio-alpha", │ ← key.partitionKey()
│ ... other fields ... │
│ } │
│ │
│ Point read uses: │
│ container.readItem( │
│ "pos-1", │ ← sortKey (document id)
│ new PartitionKey( │
│ "portfolio-alpha") │ ← partitionKey value
│ ) │
└──────────────────────────────┘
If key.sortKey() is null (i.e., MulticloudDbKey.of(partitionKey) was used), Cosmos uses
key.partitionKey() as both the document id and the partition key. This means every
document gets its own logical partition.
Note: The container must be created with
/partitionKeyas the partition key path. The SDK merges a"partitionKey"field into the JSON document automatically before upsert.
Amazon DynamoDB¶
MulticloudDbKey.of("portfolio-alpha", "pos-1")
│ │
▼ ▼
┌──────────────────────────────┐
│ DynamoDB Table │
│ (database__collection) │
│ │
│ Item attributes: │
│ "partitionKey": "portfolio-alpha" │ ← Hash key (partition key)
│ "sortKey": "pos-1" │ ← Sort key (range key)
│ ... other attributes ... │
│ │
│ Point read uses: │
│ GetItem with: │
│ { "partitionKey": "portfolio-alpha", │
│ "sortKey": "pos-1" } │
└──────────────────────────────┘
If key.sortKey() is null, no "sortKey" attribute is written or used in
reads. The table must be created with only a "partitionKey" hash key.
Note: DynamoDB has no native "database" concept. The
ResourceAddressdatabase and collection are composed into a single table name using the patterndatabase__collection(double underscore separator).
Google Cloud Spanner¶
MulticloudDbKey.of("portfolio-alpha", "pos-1")
│ │
▼ ▼
┌──────────────────────────────┐
│ Spanner Table │
│ │
│ Row columns: │
│ partitionKey: "portfolio-alpha" │ ← Primary key col 1
│ sortKey: "pos-1" │ ← Primary key col 2
│ ... other columns ... │
│ │
│ Point read uses: │
│ SELECT * FROM table │
│ WHERE partitionKey = @pk │
│ AND sortKey = @sk │
└──────────────────────────────┘
If key.sortKey() is null, sortKey defaults to key.partitionKey() - giving a
composite primary key of (pk, pk). Spanner always requires both columns
because the table DDL defines PRIMARY KEY (partitionKey, sortKey).
Partition Key Strategy Guide¶
Choosing the right partition key is the most important design decision for performance and scalability. Here are common strategies:
| Strategy | Key Pattern | When to Use | Example |
|---|---|---|---|
| Entity-per-partition | MulticloudDbKey.of(pk, pk) |
Simple lookups, no parent-child queries | Tenants, configuration docs |
| Parent-child grouping | MulticloudDbKey.of(parentId, childId) |
Query all items within a parent | Positions within a portfolio |
| Tenant isolation | MulticloudDbKey.of(tenantId, docId) |
Multi-tenant with per-tenant queries | SaaS applications |
| Time-based grouping | MulticloudDbKey.of(dateKey, eventId) |
Time-range queries, event streams | Logs partitioned by 2025-03 |
| Category grouping | MulticloudDbKey.of(category, itemId) |
Filtered queries within a category | Products by department |
| Composite | MulticloudDbKey.of(compositeValue, id) |
Multiple dimensions in partition | region#category |
Choosing Between Entity-per-Partition and Grouped Partitions¶
Entity-per-partition (MulticloudDbKey.of(pk, pk)):
- Every document gets its own partition
- Optimal for point reads - always a single-partition operation
- Queries for "all items of type X" require a cross-partition scan
- Works well when you don't need to query within groups
Grouped partitions (MulticloudDbKey.of(parentId, childId)):
- Related documents share a partition
- Point reads still work (the SDK resolves the full key)
- Queries within the group can be partition-scoped - much more efficient
- Requires knowing the partition key value at query time
Rule of thumb: If you frequently query "give me all X within Y", use Y as the partition key. If you only read documents by their ID, use the entity-per-partition pattern.
Why MulticloudDbKey Is an Explicit Parameter¶
Every CRUD operation requires an explicit MulticloudDbKey parameter - the SDK never
extracts key material from the document body. This is a deliberate design
choice.
Why not extract from the document?
Some database SDKs (notably the Azure Cosmos DB Java SDK) can extract the
partition key and document ID automatically because the container's metadata
tells the SDK which JSON path holds the partition key, and id is a fixed
Cosmos convention. Multicloud DB cannot replicate this approach portably because the
key field names differ across providers.
When a provider writes a document, it injects key fields using different names:
| Provider | MulticloudDbKey.partitionKey() stored as |
MulticloudDbKey.sortKey() stored as |
|---|---|---|
| Cosmos DB | partitionKey (custom field) |
id (built-in Cosmos field) |
| DynamoDB | partitionKey (hash key attribute) |
sortKey (range key attribute) |
| Spanner | partitionKey (primary key column) |
sortKey (primary key column) |
This means a document read back from Cosmos DB looks like:
while the same document read back from DynamoDB looks like:
A convention-based overload - upsert(address, document) - would need to look
for sortKey on DynamoDB/Spanner but id on Cosmos DB. That requires
provider-aware extraction logic in what is supposed to be a provider-agnostic
interface, which defeats the purpose of a portable abstraction.
Additional reasons for the explicit MulticloudDbKey design:
| Concern | Explicit MulticloudDbKey | Extracted from Document |
|---|---|---|
read() / delete() |
Works - no document needed | Impossible - no document to extract from |
| Consistency | All 5 operations use the same signature pattern | Writes differ from reads/deletes |
| Compile-time safety | Missing key = compiler error | Missing field = runtime error in provider |
| Key ≠ document | Key can differ from document fields (remapping, replication) | Key must match document content |
| Source of truth | Key is authoritative; providers overwrite document fields | Ambiguous when key fields and document disagree |
The explicit MulticloudDbKey keeps the API uniform (same pattern for all operations), safe (compiler-enforced), and portable (no provider-specific field name assumptions). Each provider maps the MulticloudDbKey to its native key representation internally - application code never needs to know which field name is used on which backend.
Resource Addressing¶
A ResourceAddress identifies the target database and collection for any
operation:
import com.multiclouddb.api.ResourceAddress;
ResourceAddress addr = new ResourceAddress("my-database", "my-collection");
Both database and collection are required (non-empty strings). The portable
address maps to physical storage differently per provider.
Provider Physical Mapping¶
| Part | Cosmos DB | DynamoDB | Spanner |
|---|---|---|---|
database |
Cosmos database name | Encoded into table name | N/A (database set in config) |
collection |
Cosmos container name | Encoded into table name | Spanner table name |
| Physical | database / container |
Table: database__collection |
Table within the configured DB |
Cosmos DB¶
Each ResourceAddress maps to a separate Cosmos DB database and
container. This provides true database-level isolation between tenants.
ResourceAddress("acme-risk-db", "portfolios")
→ Cosmos DB database: "acme-risk-db"
→ Cosmos container: "portfolios"
DynamoDB¶
Since DynamoDB has no native database concept, the database dimension is encoded into the table name using a double-underscore separator:
Spanner¶
Spanner uses the database configured at client creation time. The database
dimension in ResourceAddress is not used for physical addressing (Spanner has
a single database per client). Only the collection maps to a table name.
ResourceAddress("acme-risk-db", "portfolios")
→ Spanner table: "portfolios" (within the configured database)
Database-per-Tenant Pattern¶
A common architecture for multi-tenant SaaS applications is to give each tenant
its own logical "database" using ResourceAddress:
public ResourceAddress addressFor(String tenantId, String collection) {
String database = tenantId + "-risk-db";
return new ResourceAddress(database, collection);
}
// Tenant "acme-capital" accessing their portfolios:
ResourceAddress acmePortfolios = addressFor("acme-capital", "portfolios");
// → Cosmos: database="acme-capital-risk-db", container="portfolios"
// → DynamoDB: table="acme-capital-risk-db__portfolios"
// → Spanner: table="portfolios" (single database)
This pattern provides complete data isolation at the database level for Cosmos DB and at the table level for DynamoDB - without any provider-specific code.
Provisioning Resources with provisionSchema()¶
When your application starts, it typically needs to ensure that all required
databases and containers/tables exist. The SDK provides provisionSchema() to
do this efficiently in a single call - with internal parallelism so that
application code does not need to manage threading.
import java.util.*;
// Define the full schema: database name → list of collections
Map<String, List<String>> schema = new LinkedHashMap<>();
schema.put("admin-db", List.of("tenants"));
schema.put("acme-risk-db", List.of("portfolios", "positions", "risk_metrics", "alerts"));
schema.put("shared-db", List.of("market_data"));
// Provision everything - databases then containers, both phases in parallel
client.provisionSchema(schema);
How it works internally:
- Phase 1 - Databases: All databases are created concurrently using a bounded thread pool (max 10 threads). The SDK waits for all database creations to complete before proceeding.
- Phase 2 - Containers: All containers/tables are created concurrently using the same thread pool.
This two-phase approach ensures that databases exist before their containers are created, while maximising throughput within each phase.
Provider behavior:
| Provider | Database Phase | Container/Table Phase |
|---|---|---|
| Cosmos DB (cloud) | Creates via Azure Resource Manager SDK (parallel) | Creates via data-plane createContainerIfNotExists (parallel) |
| Cosmos DB (emulator) | Creates via data-plane createDatabaseIfNotExists (parallel) |
Creates via data-plane createContainerIfNotExists (parallel) |
| DynamoDB | No-op (no native database concept) | Creates tables named database__collection (parallel), waits for ACTIVE |
| Spanner | No-op (database configured at client construction) | Creates tables with DDL (parallel) |
When to use provisionSchema() vs individual calls:
| Approach | Use When |
|---|---|
provisionSchema(schema) |
Provisioning multiple databases/containers at startup - the SDK handles all parallelism |
ensureDatabase(name) |
Creating a single database on demand (e.g., new tenant onboarding) |
ensureContainer(address) |
Creating a single container on demand |
Providers may override provisionSchema() with provider-specific optimisations,
but the default SPI implementation works correctly for all providers.
CRUD Semantics¶
create - Insert Only¶
create() inserts a new document. If a document with the same key already
exists, the operation fails with a conflict error. Use this when you need
insert-only semantics with duplicate detection.
ResourceAddress addr = new ResourceAddress("mydb", "orders");
MulticloudDbKey key = MulticloudDbKey.of("customer-456", "order-123");
Map<String, Object> doc = new LinkedHashMap<>();
doc.put("total", 99.95);
doc.put("status", "pending");
client.create(addr, key, doc); // Fails if document already exists
Key behavior across providers:
| Behavior | Cosmos DB | DynamoDB | Spanner |
|---|---|---|---|
| Operation | createItem() |
putItem() with attribute_not_exists condition |
newInsertBuilder mutation |
| If exists | Throws CONFLICT | Throws CONFLICT | Throws CONFLICT |
| If not exists | Create | Create | Create |
| Return value | None (void) | None (void) | None (void) |
read - Point Read¶
read() performs a single-document read using the full key. Returns a
DocumentResult wrapping the document as an ObjectNode, or null if not found.
DocumentResult result = client.read(addr, MulticloudDbKey.of("customer-456", "order-123"));
if (result != null) {
String status = result.document().path("status").asText();
}
Key behavior across providers:
| Behavior | Cosmos DB | DynamoDB | Spanner |
|---|---|---|---|
| Operation | readItem(id, partitionKey) |
getItem(keyMap) |
SELECT * WHERE partitionKey=@pk AND sortKey=@sk |
| Not found | Returns null |
Returns null |
Returns null |
| Consistency | Session (default) | Eventual (default) | Strong |
Point reads are always efficient because the full key is provided. The provider can go directly to the partition/item without scanning.
update - Replace Existing¶
update() replaces an existing document. If the document does not exist,
the operation fails with a not-found error. Use this when you need strict
update-only semantics.
Map<String, Object> updated = new LinkedHashMap<>();
updated.put("total", 109.95);
updated.put("status", "shipped");
client.update(addr, MulticloudDbKey.of("customer-456", "order-123"), updated); // Fails if not exists
Key behavior across providers:
| Behavior | Cosmos DB | DynamoDB | Spanner |
|---|---|---|---|
| Operation | replaceItem() |
putItem() with attribute_exists condition |
newUpdateBuilder mutation |
| If exists | Replace | Replace | Replace |
| If not exists | Throws NOT_FOUND | Throws NOT_FOUND | Throws NOT_FOUND |
| Return value | None (void) | None (void) | None (void) |
upsert - Create or Replace¶
upsert() creates a new document or overwrites an existing document with
the same key. It is an insert-or-update in all providers.
ResourceAddress addr = new ResourceAddress("mydb", "orders");
MulticloudDbKey key = MulticloudDbKey.of("customer-456", "order-123");
Map<String, Object> doc = new LinkedHashMap<>();
doc.put("total", 99.95);
doc.put("status", "pending");
client.upsert(addr, key, doc); // Creates or replaces the document
Key behavior across providers:
| Behavior | Cosmos DB | DynamoDB | Spanner |
|---|---|---|---|
| Operation | upsertItem() |
putItem() (no condition) |
insertOrUpdate mutation |
| If exists | Replace | Replace | Replace |
| If not exists | Create | Create | Create |
| Return value | None (void) | None (void) | None (void) |
delete - Idempotent Delete¶
delete() removes a document by key. If the document doesn't exist, the
operation succeeds silently (no error). This makes delete idempotent and
safe for retry.
client.delete(addr, MulticloudDbKey.of("customer-456", "order-123"));
// Succeeds whether or not the document exists
Key behavior across providers:
| Behavior | Cosmos DB | DynamoDB | Spanner |
|---|---|---|---|
| Operation | deleteItem(id, partitionKey) |
deleteItem(keyMap) |
Mutation.delete(key) |
| Not found | Silent success (HTTP 404 suppressed) | Silent success | Silent success |
| Return value | None (void) | None (void) | None (void) |
Document Field Injection¶
When you call create(), update(), or upsert(), the SDK injects key
fields into the document automatically. You don't need to manually set "id"
or "partitionKey" in your JSON - the provider handles this:
Map<String, Object> doc = new LinkedHashMap<>();
doc.put("name", "Alpha Fund");
doc.put("type", "EQUITY");
// No need to set "id" or "partitionKey" in doc - injected by provider
client.upsert(addr, MulticloudDbKey.of("acme", "port-1"), doc);
What gets injected per provider:
| Provider | Injected Fields | Source |
|---|---|---|
| Cosmos DB | "id" = key.sortKey() or key.partitionKey(), "partitionKey" = key.partitionKey() |
Merged into document JSON before write |
| DynamoDB | "partitionKey" = key.partitionKey(), "sortKey" = key.sortKey() |
Added as DynamoDB item attributes |
| Spanner | "partitionKey" = key.partitionKey(), "sortKey" = key.sortKey() or key.partitionKey() |
Written as Spanner row columns |
Important: If your document JSON already contains an
"id"field, it will be overwritten by the key parameter. The key is always the source of truth for the document's identity.
Query DSL¶
Portable Expressions¶
The SDK provides a portable query expression language - write a WHERE-clause filter once and it's automatically translated to each provider's native query syntax (Cosmos SQL, DynamoDB PartiQL, Spanner GoogleSQL).
QueryRequest query = QueryRequest.builder()
.expression("status = @status AND priority > @minPriority")
.parameter("status", "active")
.parameter("minPriority", 3)
.maxPageSize(25)
.build();
QueryPage page = client.query(addr, query);
for (Map<String, Object> item : page.items()) {
System.out.println(item);
}
This produces the following native queries:
| Provider | Generated Query |
|---|---|
| Cosmos DB | SELECT * FROM c WHERE (c.status = @status AND c.priority > @minPriority) |
| DynamoDB | SELECT * FROM "table" WHERE (status = ? AND priority > ?) |
| Spanner | SELECT * FROM table WHERE (status = @status AND priority > @minPriority) |
Parameter Binding¶
Parameters use @name syntax in expressions and are passed as a
Map<String, Object>:
// Using .parameter() builder method (recommended)
QueryRequest query = QueryRequest.builder()
.expression("portfolioId = @pid")
.parameter("pid", "portfolio-alpha")
.maxPageSize(50)
.build();
// Using .parameters() with a pre-built map
QueryRequest query2 = QueryRequest.builder()
.expression("sector = @sector AND marketValue > @minValue")
.parameters(Map.of(
"sector", "Technology",
"minValue", 100000.0
))
.maxPageSize(25)
.build();
Supported parameter types: String, Integer, Long, Double, Float,
Boolean. Parameters are bound safely - no SQL injection risk.
Portable Functions¶
Five functions are supported across all providers:
| Function | Syntax | Cosmos SQL | DynamoDB PartiQL | Spanner GoogleSQL |
|---|---|---|---|---|
| STARTS_WITH | STARTS_WITH(field, @val) |
STARTSWITH(c.field, @val) |
begins_with(field, ?) |
STARTS_WITH(field, @val) |
| CONTAINS | CONTAINS(field, @val) |
CONTAINS(c.field, @val) |
contains(field, ?) |
STRPOS(field, @val) > 0 |
| FIELD_EXISTS | FIELD_EXISTS(field) |
IS_DEFINED(c.field) |
attribute_exists(field) |
field IS NOT NULL |
| STRING_LENGTH | STRING_LENGTH(field) > @n |
LENGTH(c.field) > @n |
size(field) > ? |
CHAR_LENGTH(field) > @n |
| COLLECTION_SIZE | COLLECTION_SIZE(field) > @n |
ARRAY_LENGTH(c.field) > @n |
size(field) > ? |
ARRAY_LENGTH(field) > @n |
Function Examples¶
// Find portfolios whose name starts with "Alpha"
QueryRequest q1 = QueryRequest.builder()
.expression("STARTS_WITH(name, @prefix)")
.parameter("prefix", "Alpha")
.build();
// Find positions containing "Tech" in the sector field
QueryRequest q2 = QueryRequest.builder()
.expression("CONTAINS(sector, @term)")
.parameter("term", "Tech")
.build();
// Find documents that have a "metadata" field
QueryRequest q3 = QueryRequest.builder()
.expression("FIELD_EXISTS(metadata)")
.build();
// Find names longer than 20 characters
QueryRequest q4 = QueryRequest.builder()
.expression("STRING_LENGTH(name) > @minLen")
.parameter("minLen", 20)
.build();
Pagination (Continuation Tokens)¶
All queries support cursor-based pagination using continuation tokens. Request a page, then pass the continuation token to get the next page:
// First page
QueryRequest firstPage = QueryRequest.builder()
.expression("status = @status")
.parameter("status", "active")
.maxPageSize(25)
.build();
QueryPage page = client.query(addr, firstPage);
processItems(page.items());
// Subsequent pages
while (page.continuationToken() != null) {
QueryRequest nextPage = QueryRequest.builder()
.expression("status = @status")
.parameter("status", "active")
.maxPageSize(25)
.continuationToken(page.continuationToken())
.build();
page = client.query(addr, nextPage);
processItems(page.items());
}
Continuation tokens are opaque strings - their format differs by provider (Cosmos uses its own JSON token, DynamoDB uses an encoded last-evaluated-key, Spanner uses a numeric offset). Never parse or construct them manually.
Full Scan (No Filter)¶
To retrieve all documents in a collection (a full scan), omit the expression:
QueryRequest scanAll = QueryRequest.builder()
.maxPageSize(100)
.build();
QueryPage page = client.query(addr, scanAll);
Warning: Full scans read every document in the collection. For large datasets, this is expensive (cross-partition in Cosmos, full table scan in DynamoDB and Spanner). Always prefer expression-based filtering when possible.
Expression-Based Filtering¶
Filter documents using field comparisons, logical operators, and functions:
// Simple equality
QueryRequest q1 = QueryRequest.builder()
.expression("portfolioId = @pid")
.parameter("pid", "portfolio-alpha")
.build();
// Range query
QueryRequest q2 = QueryRequest.builder()
.expression("marketValue BETWEEN @low AND @high")
.parameter("low", 50000.0)
.parameter("high", 200000.0)
.build();
// Set membership
QueryRequest q3 = QueryRequest.builder()
.expression("severity IN (@s1, @s2)")
.parameter("s1", "HIGH")
.parameter("s2", "CRITICAL")
.build();
// Negation
QueryRequest q4 = QueryRequest.builder()
.expression("NOT status = @excluded")
.parameter("excluded", "CLOSED")
.build();
Multi-Condition Queries¶
Combine multiple conditions with AND and OR, using parentheses for grouping:
// All active high-value tech positions
QueryRequest query = QueryRequest.builder()
.expression(
"portfolioId = @pid AND sector = @sector AND marketValue > @minValue"
)
.parameter("pid", "portfolio-alpha")
.parameter("sector", "Technology")
.parameter("minValue", 100000.0)
.maxPageSize(50)
.build();
// Complex boolean logic with grouping
QueryRequest complex = QueryRequest.builder()
.expression(
"(severity = @high OR severity = @critical) AND status = @open"
)
.parameter("high", "HIGH")
.parameter("critical", "CRITICAL")
.parameter("open", "OPEN")
.build();
Native Expression Passthrough¶
When you need provider-specific features not available in the portable DSL
(e.g., LIKE, ORDER BY, aggregate functions), bypass the portable layer
with nativeExpression():
// Cosmos DB - full SQL capability
QueryRequest cosmosQuery = QueryRequest.builder()
.nativeExpression(
"SELECT c.symbol, c.marketValue FROM c " +
"WHERE c.portfolioId = @pid ORDER BY c.marketValue DESC"
)
.parameter("pid", "portfolio-alpha")
.maxPageSize(10)
.build();
// DynamoDB - PartiQL syntax
QueryRequest dynamoQuery = QueryRequest.builder()
.nativeExpression(
"SELECT * FROM \"acme-risk-db__positions\" " +
"WHERE begins_with(symbol, 'AA')"
)
.maxPageSize(10)
.build();
// Spanner - GoogleSQL syntax
QueryRequest spannerQuery = QueryRequest.builder()
.nativeExpression(
"SELECT * FROM positions WHERE STARTS_WITH(symbol, 'AA') " +
"ORDER BY marketValue DESC LIMIT 10"
)
.build();
Warning:
expressionandnativeExpressionare mutually exclusive - setting both throws an error. Native expressions break portability: switching providers requires rewriting the query.
Translation Pipeline¶
When you use a portable expression, the SDK processes it through a four-stage
pipeline before execution:
┌───────────────────────────────────────────────────────────────────┐
│ 1. PARSE ExpressionParser converts the string to a typed │
│ AST (Abstract Syntax Tree) │
│ │
│ 2. VALIDATE ExpressionValidator checks that all @parameters │
│ are bound and function signatures are correct │
│ │
│ 3. TRANSLATE Provider-specific ExpressionTranslator converts │
│ the AST to native query syntax │
│ │
│ 4. EXECUTE Provider runs the translated query against the │
│ database with bound parameters │
└───────────────────────────────────────────────────────────────────┘
This is fully transparent - your application code only provides the portable expression string and parameters.
Querying with Partition Keys¶
Why Partition Keys Matter for Queries¶
In distributed databases, the partition key determines where data lives physically. This has a direct impact on query performance:
- Partition-scoped query: When the query filter includes the partition key value, the database can go directly to the relevant partition. This is O(partition size), not O(total data).
- Cross-partition query (fan-out): When the partition key is not included in the filter, the database must scan all partitions. This is expensive and may not be supported in all providers.
Cosmos DB¶
Cosmos DB partitions data by the /partitionKey path. When a query includes
partitionKey = @value, Cosmos reads only that partition. Without it, the query
fans out across all partitions (requires the cosmos.crossPartitionQuery
feature flag).
DynamoDB¶
DynamoDB uses the hash key (partitionKey) as the partition key. Queries within
a partition (using the hash key) are efficient. Without it, a full table Scan
is required.
Spanner¶
Spanner distributes data across splits based on the primary key prefix. The
(partitionKey, sortKey) composite primary key provides locality for documents
sharing the same partitionKey prefix.
Combining Expression Filters with Key Design¶
The most efficient queries combine good key design with expression filtering. Example: positions within a portfolio.
Approach 1: Entity-per-partition + Full Scan + Client-Side Filter¶
// Key: MulticloudDbKey.of(positionId, positionId) - each position in its own partition
// Query: fetch ALL positions, filter in Java
List<Map<String, Object>> allPositions = client.query(addr,
QueryRequest.builder().maxPageSize(200).build()
).items();
List<Map<String, Object>> filtered = allPositions.stream()
.filter(p -> "portfolio-alpha".equals(p.get("portfolioId")))
.toList();
Problem: Reads every position document, performs a cross-partition scan, then discards most results. O(N) where N = total positions across all portfolios.
Approach 2: Entity-per-partition + Expression Filter (Better)¶
// Key: MulticloudDbKey.of(positionId, positionId) - still entity-per-partition
// Query: let the database filter - only matching documents are returned
QueryRequest query = QueryRequest.builder()
.expression("portfolioId = @pid")
.parameter("pid", "portfolio-alpha")
.maxPageSize(50)
.build();
List<Map<String, Object>> positions = client.query(addr, query).items();
Better: The database applies the filter - you only receive matching documents. But this is still a cross-partition scan (the database still reads all partitions, it just doesn't return non-matching documents).
Approach 3: Partition-per-parent + Expression Filter (Best)¶
// Key: MulticloudDbKey.of(portfolioId, positionId) - positions grouped by portfolio
// Query: the database knows to read only the "portfolio-alpha" partition
QueryRequest query = QueryRequest.builder()
.expression("portfolioId = @pid")
.parameter("pid", "portfolio-alpha")
.maxPageSize(50)
.build();
List<Map<String, Object>> positions = client.query(addr, query).items();
Best: By using portfolioId as the partition key (first argument to
MulticloudDbKey.of), the database can scope the query to a single partition. Combined with
the expression filter, this is a single-partition read - the most efficient
pattern possible.
Note on DynamoDB: When documents use
MulticloudDbKey.of(portfolioId, positionId),portfolioIdbecomes the hash key ("partitionKey"attribute) andpositionIdbecomes the sort key ("sortKey"attribute). This gives you efficient single-partition queries byportfolioIdwith range queries onpositionIdwithin each partition.
Partition-Scoped Queries with QueryRequest.partitionKey()¶
The SDK provides a portable partition-scoping mechanism via
QueryRequest.partitionKey(). Instead of relying on the query expression to
include the partition key implicitly, you can set it explicitly on the query
request. Each provider maps this to its native partition-scoping mechanism:
| Provider | What .partitionKey(value) Does |
|---|---|
| Cosmos DB | Sets CosmosQueryRequestOptions.setPartitionKey(new PartitionKey(value)) - query reads only that logical partition |
| DynamoDB | Adds a partitionKey = :pk equality condition to the scan filter - restricts results to items sharing the given hash key value |
| Spanner | Adds a partitionKey = @pk equality condition to the SQL WHERE clause |
Basic Usage¶
// Scope query to a specific partition (e.g., a portfolio)
QueryRequest query = QueryRequest.builder()
.expression("sector = @sector")
.parameter("sector", "Technology")
.partitionKey("portfolio-alpha")
.maxPageSize(50)
.build();
// Only returns positions within "portfolio-alpha" that match sector = Technology
QueryPage page = client.query(positionsAddr, query);
Full Scan Within a Partition¶
Omit the expression to retrieve all documents in a single partition:
// Get ALL positions in portfolio-alpha (no expression filter)
QueryRequest allInPartition = QueryRequest.builder()
.partitionKey("portfolio-alpha")
.maxPageSize(100)
.build();
QueryPage page = client.query(positionsAddr, allInPartition);
This is far more efficient than a full cross-partition scan - it reads only
the documents belonging to "portfolio-alpha".
Cross-Partition Query (Default)¶
When partitionKey is not set (or set to null), the query fans out across
all partitions - the same behavior as before:
// No partitionKey → cross-partition scan
QueryRequest crossPartition = QueryRequest.builder()
.expression("severity = @sev")
.parameter("sev", "CRITICAL")
.maxPageSize(25)
.build();
Combining with Native Expressions¶
partitionKey() works with both portable and native expressions:
// Portable expression + partition scoping
QueryRequest portable = QueryRequest.builder()
.expression("marketValue > @min")
.parameter("min", 100000.0)
.partitionKey("portfolio-alpha")
.build();
// Native Cosmos SQL + partition scoping
QueryRequest native = QueryRequest.builder()
.nativeExpression("SELECT c.symbol, c.marketValue FROM c ORDER BY c.marketValue DESC")
.partitionKey("portfolio-alpha")
.build();
Performance Impact¶
| Query Type | Cosmos DB | DynamoDB | Spanner |
|---|---|---|---|
With .partitionKey() |
Single-partition read | Filtered by hash key | Filtered by partition key |
Without .partitionKey() |
Cross-partition fan-out | Full table scan | Full table scan |
| Cost difference | 1 partition vs. N partitions | 1 filter vs. full scan | 1 filter vs. full scan |
Best practice: Always set
.partitionKey()when you know the partition value at query time. This is especially important for Cosmos DB, where cross-partition queries consume significantly more RU/s.
Multi-Tenant Architecture Patterns¶
The SDK supports multi-tenant architectures through the combination of
ResourceAddress database isolation and partition key design:
┌──────────────────────────────────────────────────┐
│ Risk Platform │
│ │
│ ┌─────────────────────────────────────────────┐ │
│ │ TenantManager.addressFor("acme", "portfolios")│
│ │ → ResourceAddress("acme-risk-db", "portfolios") │
│ └────────────────────┬────────────────────────┘ │
│ │ │
│ ┌────────────────┼────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌─────────┐ ┌───────────┐ ┌───────────┐ │
│ │ Cosmos │ │ DynamoDB │ │ Spanner │ │
│ │ DB: │ │ Table: │ │ Table: │ │
│ │ acme- │ │ acme- │ │ portfolios│ │
│ │ risk-db │ │ risk-db__ │ │ │ │
│ │ /portfolios│ │ portfolios │ │ │ │
│ └─────────┘ └───────────┘ └───────────┘ │
└──────────────────────────────────────────────────┘
Each tenant's data is completely isolated at the storage level:
- Cosmos DB: Separate database per tenant
- DynamoDB: Separate tables per tenant (encoded as tenantDb__collection)
- Spanner: Shares a physical database, but collection names provide logical separation
Error Handling¶
All provider exceptions are caught and wrapped in MulticloudDbException with a
portable error category:
try {
client.upsert(addr, key, doc);
} catch (MulticloudDbException e) {
MulticloudDbErrorCategory cat = e.error().category();
if (MulticloudDbErrorCategory.THROTTLED.equals(cat)) {
retryWithBackoff();
} else if (MulticloudDbErrorCategory.NOT_FOUND.equals(cat)) {
createResource();
} else if (MulticloudDbErrorCategory.CONFLICT.equals(cat)) {
resolveConflict();
} else if (MulticloudDbErrorCategory.AUTHENTICATION_FAILED.equals(cat)) {
refreshCredentials();
} else {
log.error("Unexpected error", e);
}
}
See compatibility.md for the full error category mapping table across all providers.
Result Set Control¶
The SDK supports portable limit (Top N) and orderBy on providers that declare the respective capabilities. Both are capability-gated: callers should check provider capabilities before using them. Unsupported options are not guaranteed to be silently ignored; providers may fail fast with UNSUPPORTED_CAPABILITY (for example, DynamoDB when orderBy is specified). Also note that DynamoDB's result_limit support is per-page, not a global Top N across the full logical result set.
Checking Capabilities¶
CapabilitySet caps = client.capabilities();
boolean canLimit = caps.isSupported(Capability.RESULT_LIMIT);
boolean canOrder = caps.isSupported(Capability.ORDER_BY);
| Capability | Cosmos DB | DynamoDB | Spanner |
|---|---|---|---|
result_limit (Top N) |
✓ (SELECT TOP N) |
✓ (per-page) | ✓ (LIMIT) |
order_by |
✓ | ✗ | ✓ |
Note: DynamoDB's
result_limit=truecaps the current scan page, not the total result set. Use continuation tokens to iterate all matching items.
Limiting Results¶
QueryRequest q = QueryRequest.builder()
.expression("score >= @min")
.parameter("min", 80)
.limit(10) // top 10 results
.build();
QueryPage page = client.query(address, q);
System.out.println("Got " + page.items().size() + " items");
Ordering Results¶
QueryRequest q = QueryRequest.builder()
.expression("portfolioId = @pid")
.parameter("pid", "portfolio-42")
.orderBy("marketValue", SortDirection.DESC) // sort descending
.orderBy("symbol", SortDirection.ASC) // secondary sort
.limit(20)
.build();
SortOrder validates field names against [A-Za-z_][A-Za-z0-9_.]* at construction time, rejecting any character that could be used for injection.
Combining with Native Expressions¶
Handling of orderBy and limit with nativeExpression is provider-specific. For Cosmos DB and Spanner native passthrough, these fields are ignored and the native query string is sent as provided. For DynamoDB ExecuteStatement, limit is still applied, and setting orderBy causes an error. Use provider-native syntax for ordering/limiting when you need exact control over native query execution:
// Cosmos DB native - full SQL control
QueryRequest q = QueryRequest.builder()
.nativeExpression(
"SELECT TOP 10 c.symbol, c.marketValue FROM c " +
"ORDER BY c.marketValue DESC")
.build();
Document TTL (Time-to-Live)¶
Set a per-document expiry at write time using OperationOptions.ttlSeconds(). Supported on create(), upsert(), and update().
Prerequisite: Enable Container-Level TTL¶
TTL is enforced at the provider level and requires the collection to be configured with TTL support before writing TTL-bearing documents.
| Provider | Required Setup |
|---|---|
| Cosmos DB | Enable "Default Time to Live" on the container (set to -1 for item-level control) |
| DynamoDB | Enable TTL on the table specifying ttlExpiry as the TTL attribute |
| Spanner | No native row-level TTL - ROW_LEVEL_TTL=false; ttlSeconds is silently ignored |
Writing with TTL¶
OperationOptions opts = OperationOptions.builder()
.ttlSeconds(3_600) // expire in 1 hour
.build();
// TTL on create
client.create(address, key, doc, opts);
// TTL on upsert (create-or-replace)
client.upsert(address, key, doc, opts);
// TTL on update - carries TTL forward through the full-replace write
client.update(address, key, updatedDoc, opts);
Checking TTL Support¶
if (client.capabilities().isSupported(Capability.ROW_LEVEL_TTL)) {
client.create(address, key, doc, OperationOptions.builder()
.ttlSeconds(86_400)
.build());
} else {
// Provider will ignore ttlSeconds; write without TTL
client.create(address, key, doc);
}
Important: If you omit
ttlSecondson anupdate()call, the previously stored TTL attribute will be overwritten with no TTL (documents become permanent). Always pass the sameOperationOptionson every write if you want TTL to persist.
Document Metadata¶
Read write-metadata (last-modified timestamp, TTL expiry, version/ETag) by setting includeMetadata(true) on read operations. Metadata is null by default to avoid unnecessary overhead.
Reading Metadata¶
OperationOptions opts = OperationOptions.builder()
.includeMetadata(true)
.build();
DocumentResult result = client.read(address, key, opts);
DocumentMetadata meta = result.metadata();
if (meta != null) {
System.out.println("Last modified : " + meta.lastModified());
System.out.println("Expires at : " + meta.ttlExpiry()); // null if no TTL
System.out.println("Version/ETag : " + meta.version());
}
Provider Metadata Availability¶
| Field | Cosmos DB | DynamoDB | Spanner |
|---|---|---|---|
lastModified |
✓ (from _ts) |
✗ | ✗ |
ttlExpiry |
✗ | ✓ (stored attribute) | ✗ |
version |
✓ (ETag) | ✗ | ✗ |
Fields the provider cannot supply are returned as null. Use
Capability.WRITE_TIMESTAMP to check before accessing metadata().
if (client.capabilities().isSupported(Capability.WRITE_TIMESTAMP)) {
DocumentResult r = client.read(address, key,
OperationOptions.builder().includeMetadata(true).build());
System.out.println(r.metadata().lastModified());
}
System Property Stripping¶
The document() field of DocumentResult is always stripped of provider
system properties (_ts, _etag, _rid, _self, _attachments, partitionKey) before being returned, ensuring the same document shape regardless of which provider it was read from.
Document Size Enforcement¶
The SDK enforces a 399 KB (408,576 bytes) maximum document size before any
data leaves the client. This limit applies to create(), upsert(), and
update() on all providers.
Why 399 KB, not 400 KB?¶
Providers inject additional fields (partitionKey, sortKey, id, ttlExpiry) before writing. DynamoDB measures its 400 KB limit against the internal wire format, which can be larger than raw JSON bytes. The 1 KB safety margin prevents valid-looking documents from exceeding the wire limit after field injection.
Validation Behaviour¶
try {
client.create(address, key, hugeDoc);
} catch (MulticloudDbException e) {
if (MulticloudDbErrorCategory.INVALID_REQUEST.equals(e.error().category())) {
// Document rejected before reaching any provider
System.out.println("Document too large: " + e.getMessage());
}
}
Oversized documents are rejected at the SDK layer - no network call is made.
The error category is always INVALID_REQUEST.
Provider Diagnostics¶
Every operation can return structured diagnostics via OperationDiagnostics. Diagnostics are available on QueryPage and can be accessed after read/write operations through provider-specific logging (SLF4J).
Query Diagnostics¶
QueryRequest q = QueryRequest.builder()
.expression("status = @s")
.parameter("s", "active")
.build();
QueryPage page = client.query(address, q);
OperationDiagnostics diag = page.diagnostics(); // may be null
if (diag != null) {
System.out.printf("Provider : %s%n", diag.provider().id());
System.out.printf("Operation : %s%n", diag.operation());
System.out.printf("Latency : %s ms%n", diag.duration().toMillis());
System.out.printf("Request ID: %s%n", diag.requestId());
System.out.printf("RU charge : %.2f%n", diag.requestCharge());
System.out.printf("Item count: %d%n", diag.itemCount());
System.out.printf("ETag : %s%n", diag.etag());
}
Provider Field Availability¶
| Field | Cosmos DB | DynamoDB | Spanner |
|---|---|---|---|
requestId |
activityId | x-amzn-RequestId | - |
statusCode |
HTTP status | HTTP status | gRPC code |
requestCharge |
RU cost | consumed capacity | - |
etag |
_etag |
- | commit timestamp |
sessionToken |
session token | - | - |
itemCount |
✓ | ✓ | ✓ |
duration |
✓ | ✓ | ✓ |
Structured Logging¶
All providers emit structured SLF4J log lines at ERROR (failures) and
DEBUG (slow operations) levels with a consistent key=value format:
cosmos.error op=read db=mydb col=users statusCode=408 subStatus=20001 activityId=abc message=...
dynamo.error op=upsert table=mydb__users requestId=xyz awsErrorCode=ProvisionedThroughputExceededException
This format is grep-friendly and compatible with log aggregation systems (e.g., Azure Monitor, CloudWatch, Google Cloud Logging).
Customizing the User-Agent¶
All provider clients (Cosmos, DynamoDB, Spanner) automatically stamp outgoing requests with a canonical SDK user-agent token of the form:
You can append your own identifier to this token via the configuration builder. This is useful for downstream identification of applications, frameworks, or tenants in provider-side telemetry.
Why Set a Suffix?¶
- Distinguish traffic from different applications sharing the same SDK
version (e.g.,
my-app/1.2.3) - Identify a framework or library that wraps this SDK (e.g.,
acme-data-layer/4.0) - Tag traffic from a specific tenant or environment for support and diagnostics
Configuring the Suffix¶
MulticloudDbClientConfig config = MulticloudDbClientConfig.builder()
.provider(ProviderId.COSMOS)
.connection("endpoint", "https://my-account.documents.azure.com:443/")
.connection("key", "...")
.userAgentSuffix("my-app/1.2.3")
.build();
MulticloudDbClient client = MulticloudDbClientFactory.create(config);
The suffix is optional. Pass null to clear a previously-set value when
deriving a new builder from an existing config.
Resulting Header Format¶
When a suffix is configured, providers append it to the canonical token, separated by a single space:
Provider-specific notes:
- Cosmos - sent as the
User-AgentHTTP header. - DynamoDB - contributed via
ClientOverrideConfiguration#putAdvancedOptionwithSdkAdvancedClientOption.USER_AGENT_SUFFIX. The AWS SDK appends this to its ownaws-sdk-java/...token. - Spanner - contributed via gax
FixedHeaderProviderfor the gRPCuser-agentmetadata. The gax channel preserves and merges with the underlying gRPC default user-agent.
Validation Rules¶
The setter enforces strict rules to protect the user-agent header from injection of CR/LF or other control characters that could break framing in HTTP/2 or HTTP/1.1 transports:
| Rule | Limit |
|---|---|
| Maximum length | 256 characters |
| Allowed code points | Printable US-ASCII (0x20–0x7E) + horizontal tab (0x09) |
| Disallowed | CR (0x0D), LF (0x0A), NUL, other control chars |
Violations throw IllegalArgumentException from the builder setter, before
any client is constructed:
MulticloudDbClientConfig.Builder builder = MulticloudDbClientConfig.builder();
// Throws IllegalArgumentException - contains newline
builder.userAgentSuffix("my-app\nv1.0");
// Throws IllegalArgumentException - too long
builder.userAgentSuffix("x".repeat(257));
If a null value is passed, the suffix is cleared and providers send only
the canonical SDK token.