Remote Workers

Not every node belongs in the orchestrator process. GPU inference, native binaries, code in another language, or work that must run inside a specific network zone can all be delegated to an external worker while BLOGE keeps owning the graph, durability, and resilience.

Marking a node as remote

Add execution_mode = remote and a worker_topic to any node:

bloge

node gpuInference : InferenceOperator {
  depends_on     = [prepareBatch]
  execution_mode = remote
  worker_topic   = "gpu-inference"
  timeout        = 5m
  retry = { attempts: 3, backoff: 2s, strategy: exponential }
}

At compile time, BLOGE materializes an embedded RemoteWorkerOperator that:

Serializes a JSON payload envelope — operatorRef, workerTopic, input, retry policy, and grouped execution metadata in RemoteWorkerEnvelope.ExecutionContext — into a durable EXECUTE_NODE work item.
Suspends the node until the worker reports completion.
Uses the node's resilience timeout as the suspend deadline.

The business operatorRef is preserved on NodeSpec, and local operator-registry validation is skipped for remote-only operators (you do not need the operator class on the orchestrator's classpath). Sub-graph nodes cannot be combined with remote execution.

The control plane

The standalone bloge-examples-graph-engine/ project hosts bloge-graph-engine-service and bloge-graph-engine-server, which expose a stateless worker control plane under /api/v1/remote-workers:

Flow	Purpose
register	A worker announces itself
poll + claim	Atomically claim a durable `EXECUTE_NODE` work item
heartbeat	Keep a claim alive while long work runs
complete	Report a result and resume the suspended node
fail	Report failure; the item transitions to `RETRY_WAIT` or `DEAD_LETTER`

Polling claims work items sharded by worker_topic, so a pool of GPU workers only pulls gpu-inference work.
Failure callbacks respect the node's retry budget: retryable failures move the item to RETRY_WAIT, exhausted budgets move it to DEAD_LETTER.

A worker's lifecycle

Sequence diagram of a remote worker's lifecycle: the orchestrator enqueues an EXECUTE_NODE work item and suspends the node; a worker polls and claims it from the worker topic, runs the business logic while sending heartbeats, then reports complete(result) so the orchestrator can resume the node and continue.

If the worker never reports back before the suspend deadline, the node times out and follows its normal resilience path (retry, then fallback or failure).

When to use remote workers

Heterogeneous compute — a node needs a GPU, a native library, or a non-JVM runtime.
Network isolation — a step must run inside a different security zone.
Independent scaling — scale the worker pool for one expensive step without scaling the orchestrator.

Dead-lettered items can be inspected and replayed from the embedded Ops Console.

Next steps

Persist suspended nodes with Durable Flows.
Replay dead letters from the Event Journal & Ops Console.
Run agent tools remotely — see AI Agents & LLM Operators.

Remote Workers ​

Marking a node as remote ​

The control plane ​

A worker's lifecycle ​

When to use remote workers ​