Skip to content

Schema & I/O

BLOGE treats operator inputs and outputs as contracts that should be visible to both humans and tooling. The schema system exists to remove guesswork from graph authoring, validation, testing, and visualization.

Why explicit schemas matter

Without schemas, downstream consumers have to read operator source code to discover output shape. That makes DSL authoring weaker, runtime mismatches harder to diagnose, and visual tooling less useful.

With explicit or inferred schemas, BLOGE can:

  • validate field paths during graph build and DSL compilation
  • document operator inputs and outputs automatically
  • surface shape information in Studio and metadata JSON
  • detect mismatches earlier, before production traffic hits a broken path

Core schema model

BLOGE's schema layer is centered around SchemaDescriptor.

TypeMeaning
StructuredSchemaField-level schema with nested structure
TypedSchemaSimple type reference when field expansion is unnecessary
OpaqueSchemaEscape hatch when shape is unknown or intentionally undeclared

A structured schema is composed of FieldDescriptor records describing field name, type, required-ness, optional nested schema, and documentation.

Where schemas come from

BLOGE can discover schema from several sources.

Java-side inference

For Java operators, the runtime can infer schema from the Operator<I, O> generic types.

Typical behavior:

  • Java records -> recursive field extraction
  • POJOs -> getter-based inspection
  • Map<String, Object> -> usually degrades to OpaqueSchema

Operators can override inference by implementing SchemaAware and returning explicit schemas.

DSL-side declarations

The DSL can declare schemas inline or through reusable schema blocks.

bloge
schema UserOutput {
  id: Int
  name: String
  email: String?
}

node fetchUser {
  operator: "FetchUserOperator"
  output: UserOutput
}

Inline declarations are especially useful when a graph is defined externally and the compiler should validate field paths before runtime.

Transform inference

transform fields also participate in the schema system. Their types can come from:

  1. explicit type annotations
  2. expression type inference
  3. an Unknown fallback when inference is incomplete

Validation stages

BLOGE can validate schema at three different moments.

StageWhat it checks
Graph build timeCompatibility across edges and referenced fields
DSL compile timePath validity, declared schema references, and binding compatibility
RuntimeActual values versus declared required fields and types

Validation strictness is controlled by SchemaValidationLevel:

  • OFF
  • WARN
  • ERROR

This lets teams start with warnings and tighten enforcement as contracts stabilize.

Example: inferred Java schema

java
public record UserQuery(String userId) {}
public record UserView(String id, String name, String email) {}

public final class FetchUserOperator implements Operator<UserQuery, UserView> {
    @Override
    public UserView execute(UserQuery input, OperatorContext ctx) {
        return userService.find(input.userId());
    }
}

Here BLOGE can infer both input and output schema automatically from record components.

Example: metadata export

The Maven plugin can export operator schema into operator-metadata.json for visual tooling:

json
{
  "name": "FetchUserOperator",
  "inputSchema": {
    "kind": "structured",
    "fields": [{ "name": "userId", "type": "String", "required": true }]
  },
  "outputSchema": {
    "kind": "structured",
    "fields": [
      { "name": "id", "type": "String", "required": true },
      { "name": "name", "type": "String", "required": true },
      { "name": "email", "type": "String", "required": false }
    ]
  }
}

Studio can then use this data for operator catalogs, field completion, and data-flow visualization.

Practical guidance

  • Prefer Java records for clean inference whenever possible.
  • Use explicit schemas when a graph is authored in DSL and path validation matters.
  • Treat OpaqueSchema as a temporary escape hatch, not the default destination.
  • Version breaking output changes intentionally instead of mutating contracts invisibly.
  • Keep transform outputs typed when they become shared downstream dependencies.

What schema validation is not

Schema validation does not replace domain validation. It answers questions like:

  • does fetchUser.output.address.city exist?
  • is this field optional or required?
  • does the downstream node expect a compatible shape?

It does not decide whether a value is semantically correct for the business domain.

Next steps