Observability (OTel)

bloge-metrics-otel adds production-facing observability integrations to BLOGE. It emits metrics, traces, and structured logs that line up with the graph execution model, including retries, timeouts, and fallback behavior.

Components

Component	Role
`MetricsExecutionListener`	Emits graph, node, retry, timeout, fallback, and stream metrics
`TracingOperatorInterceptor`	Creates graph-level and node-level spans
`LoggingExecutionListener`	Writes structured lifecycle logs with BLOGE-specific MDC keys
`OtelContextCarrier`	Propagates OpenTelemetry context into engine virtual threads
`MdcContextCarrier`	Propagates SLF4J MDC values such as `traceId` and `requestId`

Manual wiring

java

TracingOperatorInterceptor tracing = new TracingOperatorInterceptor(tracer);

GraphEngine engine = GraphEngine.builder()
    .registry(registry)
    .interceptors(List.of(tracing))
    .listeners(List.of(
        tracing,
        new MetricsExecutionListener(meterRegistry, "bloge"),
        new LoggingExecutionListener(false, false)
    ))
    .contextCarriers(List.of(
        new OtelContextCarrier(),
        new MdcContextCarrier()
    ))
    .build();

The same tracing component is registered as both an interceptor and a listener so node spans are nested under the active graph span.

Metrics emitted

MetricsExecutionListener can emit:

bloge.graph.duration
bloge.node.duration
bloge.node.errors
bloge.node.retries
bloge.node.timeouts
bloge.node.fallbacks
bloge.node.skipped
bloge.stream.chunk.count
bloge.stream.duration
bloge.stream.errors

Durable integrations can add checkpoint, work-item, and lease metrics on top of these signals.

Spring Boot properties

When used with bloge-spring, observability can be configured through familiar properties:

yaml

spring:
  bloge:
    observability:
      metrics:
        enabled: true
        prefix: bloge
      tracing:
        enabled: true
      logging:
        enabled: true
      context-propagation: true
      mdc-propagation: true

Production dashboards

The example project ships a Grafana dashboard with queries such as:

graph duration p95
retry count by graph
fallback count by graph

These align well with the questions operators ask in production:

is this graph slower than usual?
are we succeeding because of fallback instead of primary service health?
which node is accumulating retries?

Logging and data sensitivity

LoggingExecutionListener can optionally include node input and output payloads. Leave both disabled unless you have reviewed the payloads for sensitive business or personal data.

Why telemetry matters in BLOGE

BLOGE's graph model makes it possible to instrument execution at the orchestration boundary instead of at arbitrary call sites. That means telemetry answers graph-level questions directly:

which branch was taken?
which node timed out?
how often are we degrading through fallback?
how much latency is added by a specific fan-out or retrying dependency?

Recommended rollout pattern

enable metrics first
add tracing where graph executions must appear in distributed traces
enable structured logs for lifecycle visibility
propagate MDC and OTel context when your platform already uses them elsewhere

Next steps

Wire telemetry automatically through Spring Boot
Add durable metrics with Durable Flows
Explore runtime examples in Example Catalog

Observability (OTel) ​

Components ​

Manual wiring ​

Metrics emitted ​

Spring Boot properties ​

Production dashboards ​

Logging and data sensitivity ​

Why telemetry matters in BLOGE ​

Recommended rollout pattern ​

Next steps ​