Observability (OTel)
bloge-metrics-otel adds production-facing observability integrations to BLOGE. It emits metrics, traces, and structured logs that line up with the graph execution model, including retries, timeouts, and fallback behavior.
Components
| Component | Role |
|---|---|
MetricsExecutionListener | Emits graph, node, retry, timeout, fallback, and stream metrics |
TracingOperatorInterceptor | Creates graph-level and node-level spans |
LoggingExecutionListener | Writes structured lifecycle logs with BLOGE-specific MDC keys |
OtelContextCarrier | Propagates OpenTelemetry context into engine virtual threads |
MdcContextCarrier | Propagates SLF4J MDC values such as traceId and requestId |
Manual wiring
TracingOperatorInterceptor tracing = new TracingOperatorInterceptor(tracer);
GraphEngine engine = GraphEngine.builder()
.registry(registry)
.interceptors(List.of(tracing))
.listeners(List.of(
tracing,
new MetricsExecutionListener(meterRegistry, "bloge"),
new LoggingExecutionListener(false, false)
))
.contextCarriers(List.of(
new OtelContextCarrier(),
new MdcContextCarrier()
))
.build();The same tracing component is registered as both an interceptor and a listener so node spans are nested under the active graph span.
Metrics emitted
MetricsExecutionListener can emit:
bloge.graph.durationbloge.node.durationbloge.node.errorsbloge.node.retriesbloge.node.timeoutsbloge.node.fallbacksbloge.node.skippedbloge.stream.chunk.countbloge.stream.durationbloge.stream.errors
Durable integrations can add checkpoint, work-item, and lease metrics on top of these signals.
Spring Boot properties
When used with bloge-spring, observability can be configured through familiar properties:
spring:
bloge:
observability:
metrics:
enabled: true
prefix: bloge
tracing:
enabled: true
logging:
enabled: true
context-propagation: true
mdc-propagation: trueProduction dashboards
The example project ships a Grafana dashboard with queries such as:
- graph duration p95
- retry count by graph
- fallback count by graph
These align well with the questions operators ask in production:
- is this graph slower than usual?
- are we succeeding because of fallback instead of primary service health?
- which node is accumulating retries?
Logging and data sensitivity
LoggingExecutionListener can optionally include node input and output payloads. Leave both disabled unless you have reviewed the payloads for sensitive business or personal data.
Why telemetry matters in BLOGE
BLOGE's graph model makes it possible to instrument execution at the orchestration boundary instead of at arbitrary call sites. That means telemetry answers graph-level questions directly:
- which branch was taken?
- which node timed out?
- how often are we degrading through fallback?
- how much latency is added by a specific fan-out or retrying dependency?
Recommended rollout pattern
- enable metrics first
- add tracing where graph executions must appear in distributed traces
- enable structured logs for lifecycle visibility
- propagate MDC and OTel context when your platform already uses them elsewhere
Next steps
- Wire telemetry automatically through Spring Boot
- Add durable metrics with Durable Flows
- Explore runtime examples in Example Catalog