Skip to content

Retries and the dead-letter queue

Webhook delivery + retry. Five attempts over roughly 15 hours; after that, the dead-letter queue.

Webhook delivery is at-least-once with bounded retries. Your receiver fails or is slow, the platform tries again on backoff. After enough failures the event lands in the dead-letter queue (DLQ), where it can be replayed manually or expires after 7 days.

This topic is the precise contract — what triggers a retry, how often, when an event is given up on, and what to do with what's left.

A delivery is considered successful when:

  • The receiver returns an HTTP status in 200299.
  • The response arrives within 10 seconds of connection start.
  • The response body is not parsed (you can return anything 2xx; the body is logged for debugging).

Everything else triggers a retry. Specifically:

  • Non-2xx status — even 401, 403, 404. These imply the receiver is misconfigured; the platform retries assuming you'll fix the receiver during the retry window.
  • Connection timeout (10s).
  • DNS resolution failure.
  • TLS handshake failure.
  • Connection reset / refused.
AttemptDelay after previous
1(immediate, on the original event)
21 minute
35 minutes
430 minutes
52 hours
612 hours

After attempt 6 fails, the event is dead-lettered. Total wall-clock window: ~14h 36m.

The delays are deterministic but jittered slightly (±10%) to avoid retry stampedes when many receivers fail simultaneously (e.g., a CDN outage).

Every delivery carries X-IntelliAuth-Timestamp. Your receiver should reject deliveries older than ~5 minutes. The platform's own retries refresh the timestamp; you should never see a retry with a stale timestamp from the platform's side.

If you DO see a stale timestamp, it is one of:

  • A network device buffered the delivery for too long (rare but real on some corporate edge proxies).
  • A replay attack — someone captured a signed delivery and is replaying it.

Either way the rejection is correct.

Because delivery is at-least-once, your receiver may see the same event twice. Dedupe by event_id:

async function processEvent(event: Webhook) {
const seen = await db.processedEvents.findUnique({ where: { id: event.event_id } })
if (seen) {
return // already processed; 2xx and ignore
}
await db.$transaction(async (tx) => {
await applyChanges(tx, event)
await tx.processedEvents.create({ data: { id: event.event_id, processed_at: new Date() } })
})
}

The dedupe table can be tiny — id + timestamp, with a 14-day TTL covering the worst-case retry window plus a buffer. Indexed lookups by event_id are fast.

When attempt 6 fails, the event lands in the subscription's DLQ. From the tenant admin console: Webhooks → your subscription → Dead-letter queue.

The DLQ shows:

  • The event payload (full body).
  • The last response your receiver returned (status, headers, body).
  • The timestamps of every attempt.

From here you can:

  • Replay — the console sends the event again, as a fresh attempt. The new attempt starts a fresh 6-try cycle.
  • Inspect — copy the payload and process manually if replay isn't viable.
  • Drop — explicitly mark the event as not-going-to-be-processed.

DLQ entries that aren't actioned within 7 days are permanently dropped. The platform does not retain them after that — your last chance to recover is within that window.

For automating "every morning, drain yesterday's DLQ":

GET /api/v1/webhooks/subscriptions/{subscription_id}/dlq
Authorization: Bearer <access-token>
Required scope: webhooks:read
Query parameters:
cursor
limit

Each entry includes the full event + attempt history.

To replay programmatically:

POST /api/v1/webhooks/dlq-entries/{entry_id}/replay
Required scope: webhooks:write

Returns immediately; the platform queues a fresh delivery. Watch the subscription's delivery feed to see the result.

To explicitly drop:

POST /api/v1/webhooks/dlq-entries/{entry_id}/drop
Required scope: webhooks:write

The entry is marked dropped and won't be replayed by future automation. Audit log records who dropped it.

A few legitimate cases where your receiver returns non-2xx and accepts the retries:

  • Your receiver is overloaded. Return 503 Service Unavailable with Retry-After. The platform respects the header up to a reasonable bound (24h).
  • The event references something your receiver can't see yet. For example, you're behind on syncing user records and the webhook arrives before the user shows up in your replica. Return 409 Conflict; the retries give your sync time to catch up.
  • You've identified the event as malformed. Don't retry that — return 2xx and log; the platform's retries can't fix a payload your code rejected on principle.

The general rule is: 2xx means "I'm not going to ask for this again", non-2xx means "please try this later". Use them with that intent.

Healthy subscriptions have <1% retry rate steady-state. A jump indicates:

  • Your receiver is unhealthy (deploy regression, dependency outage).
  • Your receiver's response time is creeping up — eventually crossing 10 seconds and timing out.
  • A new event type your receiver doesn't handle is now being sent.

Alert on retry rate. Investigate when it spikes. Don't let the DLQ silently fill.