Skip to content

Durable Flows

bloge-durable turns the BLOGE execution model into a runtime that can survive process restarts, support long-running waits, and scale into more demanding production topologies.

What durability adds

BLOGE's durable module persists execution identity and runtime state through focused stores.

StoreResponsibility
ExecutionStoreExecution lifecycle, status transitions, leases, and business-key lookup
ExecutionCheckpointStoreNode outputs, suspend context, loop snapshots, and session checkpoints
WaitStorePending waits and timeout metadata
WorkItemStoreRetry, timer, event, and task work dispatch
LeaseStoreGeneric fenced lease management
GraphRegistryStorePersisted graph definitions for cold recovery
RoutingStoreShard and tenant route bindings
TaskInboxStoreHuman task assignment and inbox data
EventMatcherStoreRuntime event correlation state

Additional services include ArchiveService for data lifecycle management and DurableControlPlaneService for governance and operational queries.

Wiring durable stores into the engine

java
DurableStoreFactory.RuntimeStores stores = DurableStoreFactory
    .builder(primaryDataSource, replicaDataSource)
    .migrateSchema()
    .runtimeStores();

GraphEngine engine = GraphEngine.builder()
    .registry(registry)
    .executionStore(stores.executionStore())
    .executionCheckpointStore(stores.executionCheckpointStore())
    .waitStore(stores.waitStore())
    .workItemStore(stores.workItemStore())
    .graphRegistryStore(stores.graphRegistryStore())
    .routingStore(stores.routingStore())
    .build();

This keeps the same programming model while adding persistence-backed recovery.

In-memory path for local development

Durable flows do not require a database during early development. The module also provides InMemory*Store implementations for tests, demos, and local scenarios.

Recovery model

Durable BLOGE flows can recover from:

  • process restarts
  • timer-based waits
  • external event resumes
  • session and round checkpoints
  • persisted graph lookup for cold-start rebuilds

The key idea is that long-running state moves into durable runtime stores rather than into ad hoc application-side tables and compensating code.

Routing, shards, and multi-tenant setups

Durable routing is built around execution identity and route keys. Available router strategies include:

  • SingleDataSourceRouter
  • HashShardRouter
  • TenantShardRouter

Tenant isolation strategies include shared tables, separate schemas, and separate databases. Read/write routing uses ScopedValue scopes so nested operations remain consistent across primary and replica data sources.

Archive and governance

Durable systems need more than checkpoint persistence. bloge-durable also includes:

  • hot-to-archive lifecycle policies (MOVE_TO_ARCHIVE, SOFT_DELETE, DELETE)
  • archive job tracking
  • transition logs
  • dead-letter inspection
  • control-plane queries for executions, tasks, and work items

This is the operational layer that helps teams run orchestration safely over time instead of only getting the first execution to succeed.

Session and task support

Durable support also underpins:

  • session checkpoint persistence
  • task inbox and human task flows
  • event correlation adapters
  • resumable wait/await patterns

These are especially important for approval workflows and conversational flows that extend across multiple user interactions.

Observability integration

When bloge-metrics-otel is present, durable stores can emit metrics such as:

  • bloge.checkpoint.write.duration
  • bloge.checkpoint.recovery.duration
  • bloge.workitem.queue.depth
  • bloge.lease.renewal
  • bloge.lease.renewal.duration

That complements the graph- and node-level metrics emitted by the core engine.

Supported databases and migrations

Durable migrations currently target:

  • H2
  • PostgreSQL
  • MySQL

Flyway migrations provision runtime, session, archive, and governance tables through versioned migration sets.

When to adopt durability

Add bloge-durable when your flow needs one or more of the following:

  • resume after restart
  • wait on timers or external signals
  • user-task or approval inboxes
  • durable routing and sharding
  • operational archive and governance controls

If all your graphs are short-lived and in-memory execution is enough, the core runtime may be sufficient.

Next steps