How to coordinate distributed services reliably? (Part 2)

In Part-1 of this article, we discussed how to co-ordinate services in a distributed system reliably and the differences between Orchestration vs Choreography. In part-2, we shall discuss how you can hand-craft your own Orchestrator without using any Orchestration tools.
Orchestration example - Happy path
Suppose you want to build your own Orchestrator. Let’s see how anAsynchronous Orchestrator might handle the happy path of an Order workflow — where each step completes successfully.
The Orchestrator coordinates multiple services by recording intent for each step. For every action, it writes two things in the same transaction:
- A record in the Saga table to track the workflow’s progress.
- A record in the Outbox table with state PENDING to signal work for a service (worker).
Workers process the events in the Outbox and perform the following actions.
- Update the state of the saga with the action that was just completed
- Update the outbox record status to DONE after the action is performed.
The use of the Outbox table makes this Orchestrator inherently asynchronous.
Here`s a step-by-step overview of how the orchestrator and outbox workers interact. The saga and outbox datastores don`t have to be relational databases; they`re shown this way here for easier visualisation. Some might refer to this pattern as Orchestrated Choreography

Orchestration example - Failure state
How would the saga and outbox tables look when there is a shipping failure?

Orchestrated Choreography
With the Outbox pattern as demonstrated above, even though we are calling it Orchestration, the line between Orchestration and Choreography becomes blurry. The Orchestrator still drives the workflow by recording intent and publishing events, but from the perspective of the workers consuming and processing those events, it looks just like Choreography. Some refer to this as Orchestrated Choreography.
With Orchestrated Choreography, workers are still reacting to events (“reserve inventory,” “process payment,” etc.), just as they would in Choreography. The difference is who decides what comes next:
- Orchestrator → central decision maker
- Choreography → services collectively drive the flow.
Practical Considerations
Bonus : Orchestrator Saga Pseudocode
For the above Order workflow, here`s how the pseudo code might look like.
# Orchestrator
orchestrator process_order(order_id, amount):
saga = fetch_saga(order_id)
if saga is null:
saga = create_saga(order_id, state = started, version = 0)
# step 1: payment intent
if saga.state == started:
payment_outbox = create_outbox_entry(
order_id = order_id,
action = charge_payment,
payload = {amount: amount, idempotency_key: order_id}
)
persist_atomic(saga_update = {state: payment_intent}, outbox_entry = payment_outbox)
return
# step 2: inventory intent
if saga.state == payment_success:
inventory_outbox = create_outbox_entry(
order_id = order_id,
action = reserve_inventory,
payload = {idempotency_key: order_id}
)
persist_atomic(saga_update = {state: inventory_intent}, outbox_entry = inventory_outbox)
return
# step 3: shipping intent
if saga.state == inventory_reserved:
shipping_outbox = create_outbox_entry(
order_id = order_id,
action = schedule_shipping,
payload = {idempotency_key: order_id}
)
persist_atomic(saga_update = {state: shipping_intent}, outbox_entry = shipping_outbox)
return
# final step: mark completed if shipping succeeded
if saga.state == shipping_scheduled:
persist_saga(saga_update = {state: completed})
notify_customer(order_id, success = true)
return
# global compensation function
compensate_order(order_id):
saga = fetch_saga(order_id)
# only trigger refund/compensation if a downstream step failed
if saga.state in [inventory_failed, shipping_failed]:
payment_id = saga.metadata.payment_id
refund_outbox = create_outbox_entry(
order_id = order_id,
action = refund_payment,
payload = {payment_id: payment_id, idempotency_key: "refund-" + order_id}
)
persist_atomic(saga_update = {state: refund_intent}, outbox_entry = refund_outbox)
# Payment worker
worker paymentworker:
loop forever:
batch = fetch_and_lock_outbox_rows(action="charge_payment", limit=100, stale_after=5m)
if batch is empty:
sleep(1s)
continue
for each entry in batch:
process_payment_entry(entry)
process_payment_entry(entry):
try:
# call external payment service (idempotent)
result = payment_service.charge(entry.payload)
if result.success:
-- transaction begin
update saga
set state = payment_success,
metadata.payment_id = result.payment_id,
version = version + 1
where order_id = entry.order_id
update outbox
set status = done,
attempts = attempts + 1,
locked_at = null,
next_try = null,
last_error = null
where outbox_id = entry.outbox_id
-- transaction commit
return
else:
raise external_transient_error # treat as retryable failure
catch external_transient_error:
entry.attempts += 1
if entry.attempts >= entry.max_attempts:
-- transaction begin
update saga
set state = manual_intervention_required,
version = version + 1
where order_id = entry.order_id
update outbox
set status = failed,
attempts = entry.attempts,
locked_at = null
where outbox_id = entry.outbox_id
-- transaction commit
alert_ops(entry.order_id)
else:
backoff = compute_backoff(entry.attempts)
-- transaction begin
update outbox
set status = pending,
next_try = now() + backoff,
attempts = entry.attempts,
locked_at = null
where outbox_id = entry.outbox_id
-- transaction commit
catch external_permanent_error:
-- transaction begin
update saga
set state = manual_intervention_required,
version = version + 1
where order_id = entry.order_id
update outbox
set status = failed,
attempts = entry.attempts + 1,
locked_at = null
where outbox_id = entry.outbox_id
-- transaction commit
alert_ops(entry.order_id)
# Used by workers to lock and process entries in the outbox table
function fetch_and_lock_outbox_rows(action, limit=100, stale_after_minutes=5):
now = current_timestamp()
-- start transaction
rows = select *
from outbox
where action = action
and (
status = 'pending'
or (status = 'in_progress' and locked_at < now - interval 'stale_after_minutes minutes')
)
order by created_at asc
limit limit
for update skip locked
if rows is empty:
-- nothing to process, commit and return empty list
commit
return []
-- mark rows as locked/in_progress
for each row in rows:
update outbox
set status = 'in_progress',
locked_at = now
where outbox_id = row.outbox_id
commit
return rows