Remote Workers
Not every node belongs in the orchestrator process. GPU inference, native binaries, code in another language, or work that must run inside a specific network zone can all be delegated to an external worker while BLOGE keeps owning the graph, durability, and resilience.
Marking a node as remote
Add execution_mode = remote and a worker_topic to any node:
node gpuInference : InferenceOperator {
depends_on = [prepareBatch]
execution_mode = remote
worker_topic = "gpu-inference"
timeout = 5m
retry = { attempts: 3, backoff: 2s, strategy: exponential }
}At compile time, BLOGE materializes an embedded RemoteWorkerOperator that:
- Serializes a JSON payload envelope —
operatorRef,workerTopic,input, retry policy, and grouped execution metadata inRemoteWorkerEnvelope.ExecutionContext— into a durableEXECUTE_NODEwork item. - Suspends the node until the worker reports completion.
- Uses the node's resilience timeout as the suspend deadline.
The business operatorRef is preserved on NodeSpec, and local operator-registry validation is skipped for remote-only operators (you do not need the operator class on the orchestrator's classpath). Sub-graph nodes cannot be combined with remote execution.
The control plane
The standalone bloge-examples-graph-engine/ project hosts bloge-graph-engine-service and bloge-graph-engine-server, which expose a stateless worker control plane under /api/v1/remote-workers:
| Flow | Purpose |
|---|---|
| register | A worker announces itself |
| poll + claim | Atomically claim a durable EXECUTE_NODE work item |
| heartbeat | Keep a claim alive while long work runs |
| complete | Report a result and resume the suspended node |
| fail | Report failure; the item transitions to RETRY_WAIT or DEAD_LETTER |
- Polling claims work items sharded by
worker_topic, so a pool of GPU workers only pullsgpu-inferencework. - Failure callbacks respect the node's retry budget: retryable failures move the item to
RETRY_WAIT, exhausted budgets move it toDEAD_LETTER.
A worker's lifecycle
If the worker never reports back before the suspend deadline, the node times out and follows its normal resilience path (retry, then fallback or failure).
When to use remote workers
- Heterogeneous compute — a node needs a GPU, a native library, or a non-JVM runtime.
- Network isolation — a step must run inside a different security zone.
- Independent scaling — scale the worker pool for one expensive step without scaling the orchestrator.
Dead-lettered items can be inspected and replayed from the embedded Ops Console.
Next steps
- Persist suspended nodes with Durable Flows.
- Replay dead letters from the Event Journal & Ops Console.
- Run agent tools remotely — see AI Agents & LLM Operators.