Multicloud DB SDK - Developer Guide¶

Comprehensive reference for the Multicloud DB SDK's key management, CRUD operations, query DSL, and provider-specific storage behavior. For a quick-start tutorial and architecture overview, see the Getting Started guide. For the portable API surface and error mapping reference, see Compatibility.

Table of Contents¶

Key & Partition Key
The Key Model
When to Use Each Key Form
Provider Storage Mapping
- Azure Cosmos DB
- Amazon DynamoDB
- Google Cloud Spanner
Partition Key Strategy Guide
Why MulticloudDbKey Is an Explicit Parameter
Resource Addressing
Provider Physical Mapping
Database-per-Tenant Pattern
Provisioning Resources with provisionSchema()
CRUD Semantics
create - Insert Only
read - Point Read
update - Replace Existing
upsert - Create or Replace
delete - Idempotent Delete
Document Field Injection
Query DSL
Portable Expressions
Parameter Binding
Portable Functions
Pagination (Continuation Tokens)
Full Scan (No Filter)
Expression-Based Filtering
Multi-Condition Queries
Native Expression Passthrough
Translation Pipeline
Querying with Partition Keys
Why Partition Keys Matter for Queries
Combining Expression Filters with Key Design
Partition-Scoped Queries with QueryRequest.partitionKey()
Multi-Tenant Architecture Patterns
Error Handling
Result Set Control
Checking Capabilities
Limiting Results
Ordering Results
Combining with Native Expressions
Document TTL (Time-to-Live)
Prerequisite: Enable Container-Level TTL
Writing with TTL
Checking TTL Support
Document Metadata
Reading Metadata
Provider Metadata Availability
System Property Stripping
Document Size Enforcement
Provider Diagnostics
Query Diagnostics
Provider Field Availability
Structured Logging
Customizing the User-Agent
Why Set a Suffix?
Configuring the Suffix
Resulting Header Format
Validation Rules

Key & Partition Key¶

The Key Model¶

Every document stored through Multicloud DB is identified by a MulticloudDbKey - a portable representation of the minimum key material needed to uniquely identify a record across all three supported providers.

import com.multiclouddb.api.MulticloudDbKey;

// 1. Simple partition-key-only key
MulticloudDbKey simple = MulticloudDbKey.of("order-123");

// 2. Partition key + sort key
MulticloudDbKey partitioned = MulticloudDbKey.of("customer-456", "order-123");

// 3. Partition + sort + composite components (future extensibility)
MulticloudDbKey composite = MulticloudDbKey.of("customer-456", "order-123",
    Map.of("region", "us-east", "year", "2025"));

The three fields:

Field	Type	Required	Purpose
`partitionKey`	`String`	Yes (must be non-empty)	Distribution/hash key that determines which physical partition stores the item
`sortKey`	`String`	No (nullable)	Item identifier within a partition; maps to DynamoDB sort/range key, Cosmos DB `id`, or Spanner's second primary-key column
`components`	`Map<String,String>`	No (empty by default)	Additional composite key dimensions (extensibility)

When to Use Each Key Form¶

`MulticloudDbKey.of(partitionKey)` - Simple Key¶

Use when each document is independent and you don't need to group or co-locate related documents.

// A standalone configuration document
MulticloudDbKey configKey = MulticloudDbKey.of("app-settings");
client.upsert(address, configKey, settingsDoc);

Provider behavior: - Cosmos DB: Uses partitionKey as both the document id and the partition key (effectively a single-partition-per-document layout). - DynamoDB: Uses partitionKey as the hash key only - no sort key is written. - Spanner: Uses partitionKey as both the partitionKey column and the sortKey column.

Best for: Lookup tables, configuration documents, singleton records.

`MulticloudDbKey.of(partitionKey, sortKey)` - Partition Key + Sort Key¶

Use when documents naturally belong to a group and you want to co-locate them for efficient querying. The partitionKey defines the co-location group; the sortKey uniquely identifies the document within the partition.

// Positions within a portfolio - portfolioId is the partition key
MulticloudDbKey positionKey = MulticloudDbKey.of("portfolio-alpha", "position-001");

// Orders within a customer - customerId is the partition key
MulticloudDbKey orderKey = MulticloudDbKey.of("customer-456", "order-789");

// Events within a session - sessionId is the partition key
MulticloudDbKey eventKey = MulticloudDbKey.of("session-abc", "event-42");

Provider behavior: - Cosmos DB: sortKey → document's id field; partitionKey → stored as the partitionKey field and used as the Cosmos DB partition key value. Documents sharing the same partitionKey value are co-located in the same logical partition. - DynamoDB: partitionKey → hash key attribute ("partitionKey"); sortKey → sort key attribute ("sortKey"). Together they form the composite primary key. - Spanner: partitionKey → partitionKey column; sortKey → sortKey column. Both form the composite primary key.

Best for: Parent-child relationships, multi-tenant data, event streams, any scenario where you frequently query "all items within group X."

`MulticloudDbKey.of(pk, pk)` - Partition Key Equals Sort Key¶

A common pattern where the document's sort key equals its partition key. Each document occupies its own partition. This is the simplest layout and is appropriate when you don't need to group documents.

// Each portfolio is its own partition
MulticloudDbKey portfolioKey = MulticloudDbKey.of("portfolio-alpha", "portfolio-alpha");

Provider behavior: Identical to MulticloudDbKey.of(partitionKey, sortKey) with an explicit sort key value, but every document is in its own partition (no co-location).

Best for: Top-level entities (tenants, portfolios, users) where you access documents primarily by their ID rather than querying within a group.

Provider Storage Mapping¶

The MulticloudDbKey type is portable - you write MulticloudDbKey.of(partitionKey, sortKey) and each provider maps it to its native key model. Here's exactly what happens in each:

Azure Cosmos DB¶

MulticloudDbKey.of("portfolio-alpha", "pos-1")
         │                  │
         ▼                  ▼
   ┌──────────────────────────────┐
   │  Cosmos DB Container         │
   │                              │
   │  Document JSON:              │
   │  {                           │
   │    "id": "pos-1",            │  ← key.sortKey()
   │    "partitionKey": "portfolio-alpha", │  ← key.partitionKey()
   │    ... other fields ...      │
   │  }                           │
   │                              │
   │  Point read uses:            │
   │    container.readItem(       │
   │      "pos-1",                │  ← sortKey (document id)
   │      new PartitionKey(       │
   │        "portfolio-alpha")    │  ← partitionKey value
   │    )                         │
   └──────────────────────────────┘

If key.sortKey() is null (i.e., MulticloudDbKey.of(partitionKey) was used), Cosmos uses key.partitionKey() as both the document id and the partition key. This means every document gets its own logical partition.

Note: The container must be created with /partitionKey as the partition key path. The SDK merges a "partitionKey" field into the JSON document automatically before upsert.

Amazon DynamoDB¶

MulticloudDbKey.of("portfolio-alpha", "pos-1")
         │                  │
         ▼                  ▼
   ┌──────────────────────────────┐
   │  DynamoDB Table              │
   │  (database__collection)      │
   │                              │
   │  Item attributes:            │
   │    "partitionKey": "portfolio-alpha" │  ← Hash key (partition key)
   │    "sortKey":      "pos-1"   │  ← Sort key (range key)
   │    ... other attributes ...  │
   │                              │
   │  Point read uses:            │
   │    GetItem with:             │
   │      { "partitionKey": "portfolio-alpha", │
   │        "sortKey": "pos-1" }  │
   └──────────────────────────────┘

If key.sortKey() is null, no "sortKey" attribute is written or used in reads. The table must be created with only a "partitionKey" hash key.

Note: DynamoDB has no native "database" concept. The ResourceAddress database and collection are composed into a single table name using the pattern database__collection (double underscore separator).

Google Cloud Spanner¶

MulticloudDbKey.of("portfolio-alpha", "pos-1")
         │                  │
         ▼                  ▼
   ┌──────────────────────────────┐
   │  Spanner Table               │
   │                              │
   │  Row columns:                │
   │    partitionKey: "portfolio-alpha" │  ← Primary key col 1
   │    sortKey:      "pos-1"     │  ← Primary key col 2
   │    ... other columns ...     │
   │                              │
   │  Point read uses:            │
   │    SELECT * FROM table       │
   │    WHERE partitionKey = @pk  │
   │      AND sortKey = @sk       │
   └──────────────────────────────┘

If key.sortKey() is null, sortKey defaults to key.partitionKey() - giving a composite primary key of (pk, pk). Spanner always requires both columns because the table DDL defines PRIMARY KEY (partitionKey, sortKey).

Partition Key Strategy Guide¶

Choosing the right partition key is the most important design decision for performance and scalability. Here are common strategies:

Strategy	Key Pattern	When to Use	Example
Entity-per-partition	`MulticloudDbKey.of(pk, pk)`	Simple lookups, no parent-child queries	Tenants, configuration docs
Parent-child grouping	`MulticloudDbKey.of(parentId, childId)`	Query all items within a parent	Positions within a portfolio
Tenant isolation	`MulticloudDbKey.of(tenantId, docId)`	Multi-tenant with per-tenant queries	SaaS applications
Time-based grouping	`MulticloudDbKey.of(dateKey, eventId)`	Time-range queries, event streams	Logs partitioned by `2025-03`
Category grouping	`MulticloudDbKey.of(category, itemId)`	Filtered queries within a category	Products by department
Composite	`MulticloudDbKey.of(compositeValue, id)`	Multiple dimensions in partition	`region#category`

Choosing Between Entity-per-Partition and Grouped Partitions¶

Entity-per-partition (MulticloudDbKey.of(pk, pk)): - Every document gets its own partition - Optimal for point reads - always a single-partition operation - Queries for "all items of type X" require a cross-partition scan - Works well when you don't need to query within groups

Grouped partitions (MulticloudDbKey.of(parentId, childId)): - Related documents share a partition - Point reads still work (the SDK resolves the full key) - Queries within the group can be partition-scoped - much more efficient - Requires knowing the partition key value at query time

Rule of thumb: If you frequently query "give me all X within Y", use Y as the partition key. If you only read documents by their ID, use the entity-per-partition pattern.

Why MulticloudDbKey Is an Explicit Parameter¶

Every CRUD operation requires an explicit MulticloudDbKey parameter - the SDK never extracts key material from the document body. This is a deliberate design choice.

Why not extract from the document?

Some database SDKs (notably the Azure Cosmos DB Java SDK) can extract the partition key and document ID automatically because the container's metadata tells the SDK which JSON path holds the partition key, and id is a fixed Cosmos convention. Multicloud DB cannot replicate this approach portably because the key field names differ across providers.

When a provider writes a document, it injects key fields using different names:

Provider	`MulticloudDbKey.partitionKey()` stored as	`MulticloudDbKey.sortKey()` stored as
Cosmos DB	`partitionKey` (custom field)	`id` (built-in Cosmos field)
DynamoDB	`partitionKey` (hash key attribute)	`sortKey` (range key attribute)
Spanner	`partitionKey` (primary key column)	`sortKey` (primary key column)

This means a document read back from Cosmos DB looks like:

{"id": "pos-42", "partitionKey": "tenant-1", "name": "Alpha Fund"}

while the same document read back from DynamoDB looks like:

{"sortKey": "pos-42", "partitionKey": "tenant-1", "name": "Alpha Fund"}

A convention-based overload - upsert(address, document) - would need to look for sortKey on DynamoDB/Spanner but id on Cosmos DB. That requires provider-aware extraction logic in what is supposed to be a provider-agnostic interface, which defeats the purpose of a portable abstraction.

Additional reasons for the explicit MulticloudDbKey design:

Concern	Explicit MulticloudDbKey	Extracted from Document
`read()` / `delete()`	Works - no document needed	Impossible - no document to extract from
Consistency	All 5 operations use the same signature pattern	Writes differ from reads/deletes
Compile-time safety	Missing key = compiler error	Missing field = runtime error in provider
Key ≠ document	Key can differ from document fields (remapping, replication)	Key must match document content
Source of truth	Key is authoritative; providers overwrite document fields	Ambiguous when key fields and document disagree

The explicit MulticloudDbKey keeps the API uniform (same pattern for all operations), safe (compiler-enforced), and portable (no provider-specific field name assumptions). Each provider maps the MulticloudDbKey to its native key representation internally - application code never needs to know which field name is used on which backend.

Resource Addressing¶

A ResourceAddress identifies the target database and collection for any operation:

import com.multiclouddb.api.ResourceAddress;

ResourceAddress addr = new ResourceAddress("my-database", "my-collection");

Both database and collection are required (non-empty strings). The portable address maps to physical storage differently per provider.

Provider Physical Mapping¶

Part	Cosmos DB	DynamoDB	Spanner
`database`	Cosmos database name	Encoded into table name	N/A (database set in config)
`collection`	Cosmos container name	Encoded into table name	Spanner table name
Physical	`database / container`	Table: `database__collection`	Table within the configured DB

Cosmos DB¶

Each ResourceAddress maps to a separate Cosmos DB database and container. This provides true database-level isolation between tenants.

ResourceAddress("acme-risk-db", "portfolios")
  → Cosmos DB database: "acme-risk-db"
  → Cosmos container:   "portfolios"

DynamoDB¶

Since DynamoDB has no native database concept, the database dimension is encoded into the table name using a double-underscore separator:

ResourceAddress("acme-risk-db", "portfolios")
  → DynamoDB table: "acme-risk-db__portfolios"

Spanner¶

Spanner uses the database configured at client creation time. The database dimension in ResourceAddress is not used for physical addressing (Spanner has a single database per client). Only the collection maps to a table name.

ResourceAddress("acme-risk-db", "portfolios")
  → Spanner table: "portfolios" (within the configured database)

Database-per-Tenant Pattern¶

A common architecture for multi-tenant SaaS applications is to give each tenant its own logical "database" using ResourceAddress:

public ResourceAddress addressFor(String tenantId, String collection) {
    String database = tenantId + "-risk-db";
    return new ResourceAddress(database, collection);
}

// Tenant "acme-capital" accessing their portfolios:
ResourceAddress acmePortfolios = addressFor("acme-capital", "portfolios");
// → Cosmos: database="acme-capital-risk-db", container="portfolios"
// → DynamoDB: table="acme-capital-risk-db__portfolios"
// → Spanner: table="portfolios" (single database)

This pattern provides complete data isolation at the database level for Cosmos DB and at the table level for DynamoDB - without any provider-specific code.

Provisioning Resources with `provisionSchema()`¶

When your application starts, it typically needs to ensure that all required databases and containers/tables exist. The SDK provides provisionSchema() to do this efficiently in a single call - with internal parallelism so that application code does not need to manage threading.

import java.util.*;

// Define the full schema: database name → list of collections
Map<String, List<String>> schema = new LinkedHashMap<>();
schema.put("admin-db",    List.of("tenants"));
schema.put("acme-risk-db", List.of("portfolios", "positions", "risk_metrics", "alerts"));
schema.put("shared-db",    List.of("market_data"));

// Provision everything - databases then containers, both phases in parallel
client.provisionSchema(schema);

How it works internally:

Phase 1 - Databases: All databases are created concurrently using a bounded thread pool (max 10 threads). The SDK waits for all database creations to complete before proceeding.
Phase 2 - Containers: All containers/tables are created concurrently using the same thread pool.

This two-phase approach ensures that databases exist before their containers are created, while maximising throughput within each phase.

Provider behavior:

Provider	Database Phase	Container/Table Phase
Cosmos DB (cloud)	Creates via Azure Resource Manager SDK (parallel)	Creates via data-plane `createContainerIfNotExists` (parallel)
Cosmos DB (emulator)	Creates via data-plane `createDatabaseIfNotExists` (parallel)	Creates via data-plane `createContainerIfNotExists` (parallel)
DynamoDB	No-op (no native database concept)	Creates tables named `database__collection` (parallel), waits for ACTIVE
Spanner	No-op (database configured at client construction)	Creates tables with DDL (parallel)

When to use provisionSchema() vs individual calls:

Approach	Use When
`provisionSchema(schema)`	Provisioning multiple databases/containers at startup - the SDK handles all parallelism
`ensureDatabase(name)`	Creating a single database on demand (e.g., new tenant onboarding)
`ensureContainer(address)`	Creating a single container on demand

Providers may override provisionSchema() with provider-specific optimisations, but the default SPI implementation works correctly for all providers.

CRUD Semantics¶

create - Insert Only¶

create() inserts a new document. If a document with the same key already exists, the operation fails with a conflict error. Use this when you need insert-only semantics with duplicate detection.

ResourceAddress addr = new ResourceAddress("mydb", "orders");
MulticloudDbKey key = MulticloudDbKey.of("customer-456", "order-123");
Map<String, Object> doc = new LinkedHashMap<>();
doc.put("total", 99.95);
doc.put("status", "pending");

client.create(addr, key, doc);   // Fails if document already exists