Skip to content

Reset a stuck provisioning

Reset is the heavier escape valve for a provisioning workflow that won't move and won't fail. It clears whatever workflow state is holding the tenant in place, then leaves the tenant in Pending so you can retry cleanly. Different from retry (which restarts the workflow) and different from cancel (which aborts an in-flight workflow).

Three scenarios fit reset:

  • Workflow is stuck mid-saga with no progress. The audit feed shows the last step completed minutes ago and nothing since. Retry won't initiate because the workflow engine still considers the old workflow as running. Reset clears the lock.
  • A previous attempt recorded an error but the tenant didn't cleanly transition to Failed. The detail page shows an error in the timeline but state is still Pending or Provisioning. Reset bumps the row to a clean Pending so retry works.
  • Workflow id on the row is wrong / orphaned. Rare, but if the row references a workflow id that doesn't exist in the workflow engine (cluster reset, manual cleanup), reset gives you a clean slate.

Reset is NOT the right move when:

  • The workflow is genuinely making progress, just slow. Look at heartbeat timings before you reach for reset — see Provisioning timeouts + heartbeats.
  • The tenant is in Failed state with a clear failed_activity. Retry is the path there; reset would only buy you the same retry with extra steps.
  • You want to abort the provision. Cancel (Cancel provisioning) is the right shape — it runs the compensation chain. Reset doesn't roll back; it just clears the workflow state without releasing what's been allocated.

From the tenant detail page, when the platform detects a reset-worthy state (Pending or Provisioning with a recorded error), the Reset provisioning button appears in the Actions section. Click it.

The platform:

  • Marks the current workflow handle as orphaned (no longer tracked).
  • Clears the recorded error.
  • Leaves any already-allocated resources in place (they'll be reused or released on the next attempt).
  • Flips the tenant to Pending state.
  • Emits tenant.provisioning.reset_started and, when the workflow handle is cleared, tenant.provisioning.reset_completed to the audit feed. If reset itself runs into trouble you'll see tenant.provisioning.reset_failed instead.

Once the reset completes, the tenant detail page reflects the new state. From there, click Retry provisioning to start a fresh attempt.

This is the important nuance. Cancel runs a compensation chain — it releases allocations. Reset does not.

If the previous workflow had allocated storage and identity for the tenant before getting stuck, those allocations stay. They'll get reused when the next attempt starts, or released by the compensation chain if the next attempt fails too.

This is by design: reset is the "I want to keep what's been done so far and try again" tool. If you want a fully clean slate (no leftover allocations), the path is decommission instead — see Decommission.

A quick decision table:

SituationAction
In-flight workflow making progressWait
In-flight workflow you no longer wantCancel
Workflow stuck, no progress, no clean failReset, then retry
Workflow cleanly failed (state = Failed)Retry
Workflow cleanly failed, you've given upDecommission

Reset is rarely the right answer on a first pass; usually you want either Cancel (running) or Retry (failed). Reach for Reset only when the state machine itself is wedged.