Build a production-grade webhook handler

A toy webhook receiver is ten lines. A production one is a small system: signature verification, idempotency, asynchronous processing, retry handling, observability. This tutorial walks the whole thing — start from "blank Node project", end at "I'd run this in production." Each piece is short; the assembly is what matters.

The shape we're building:

HTTPS endpoint receives the POST.
Validates HMAC + anti-replay timestamp.
Dedupes by event_id.
Enqueues onto a Redis-backed queue (BullMQ).
A worker drains the queue, processes events, with its own retry on failure.
Metrics + structured logs throughout.

We use Node + Express + BullMQ. The shape transfers to any stack.

Before you begin

Node 18+, pnpm, a Redis instance (Docker is fine: docker run -p 6379:6379 redis:7)
An IntelliAuth tenant + admin permissions to create a webhook subscription
A test public hostname (ngrok, Cloudflare Tunnel, or similar)

Scaffold

mkdir webhook-handler && cd webhook-handler
pnpm init -y
pnpm add express bullmq ioredis pino
pnpm add -D typescript @types/express @types/node ts-node

{
  "compilerOptions": {
    "target": "es2022",
    "module": "node16",
    "moduleResolution": "node16",
    "strict": true,
    "esModuleInterop": true,
    "outDir": "dist"
  }
}

The receiver

src/receiver.ts:

import express from 'express'
import { createHmac, timingSafeEqual } from 'node:crypto'
import { Queue } from 'bullmq'
import Redis from 'ioredis'
import pino from 'pino'

const log = pino({ name: 'webhook-receiver' })

const redis = new Redis(process.env.REDIS_URL!, { maxRetriesPerRequest: null })
const queue = new Queue('intelliauth-events', { connection: redis })

const SECRET = process.env.INTELLIAUTH_WEBHOOK_SECRET!
const REPLAY_WINDOW_S = 5 * 60

function verify(rawBody: Buffer, signature: string, timestamp: string): boolean {
  const ts = Number(timestamp)
  if (!Number.isFinite(ts)) return false
  const now = Math.floor(Date.now() / 1000)
  if (Math.abs(now - ts) > REPLAY_WINDOW_S) return false

  const expected = createHmac('sha256', SECRET).update(rawBody).digest('hex')
  if (signature.length !== expected.length) return false
  return timingSafeEqual(Buffer.from(signature), Buffer.from(expected))
}

const app = express()

app.post(
  '/webhooks/intelliauth',
  express.raw({ type: 'application/json', limit: '1mb' }),
  async (req, res) => {
    const signature = req.header('X-IntelliAuth-Signature') ?? ''
    const timestamp = req.header('X-IntelliAuth-Timestamp') ?? ''
    const eventId   = req.header('X-IntelliAuth-Event-Id')  ?? ''
    const body      = req.body as Buffer

    if (!verify(body, signature, timestamp)) {
      log.warn({ eventId }, 'signature verification failed')
      return res.status(401).json({ error: 'invalid_signature' })
    }

    // Idempotency — single insert, dedup by event_id.
    const inserted = await redis.set(`dedupe:${eventId}`, '1', 'EX', 14 * 24 * 3600, 'NX')
    if (inserted === null) {
      log.info({ eventId }, 'duplicate; skipping')
      return res.status(200).end()
    }

    const event = JSON.parse(body.toString('utf8'))
    await queue.add(event.event_type, event, {
      jobId:           eventId,                   // BullMQ dedupes too
      removeOnComplete: { count: 1000 },          // keep last 1000 for debugging
      removeOnFail:     false,                    // keep failed jobs for inspection
      attempts:         5,
      backoff:          { type: 'exponential', delay: 5_000 },
    })

    log.info({ eventId, eventType: event.event_type }, 'enqueued')
    res.status(202).end()
  },
)

const port = Number(process.env.PORT ?? 4000)
app.listen(port, () => log.info({ port }, 'receiver listening'))

A few choices worth calling out:

express.raw — get the body as bytes, BEFORE any JSON parsing. Required for the signature to verify.
redis.set ... NX — atomic dedup. Two simultaneous deliveries of the same event_id race here; one wins.
res.status(202) — explicit "accepted for processing" rather than 200. Cosmetic but documents intent.
removeOnFail: false — failed jobs stay in BullMQ for inspection rather than disappearing.

The worker

src/worker.ts:

import { Worker, Job } from 'bullmq'
import Redis from 'ioredis'
import pino from 'pino'

const log = pino({ name: 'webhook-worker' })

const redis = new Redis(process.env.REDIS_URL!, { maxRetriesPerRequest: null })

interface Event {
  event_id:    string
  event_type:  string
  occurred_at: string
  tenant:      { id: string; slug: string }
  data:        Record<string, unknown>
  delivery?:   { attempt: number }
}

const worker = new Worker<Event>(
  'intelliauth-events',
  async (job: Job<Event>) => {
    const event = job.data
    log.info({ jobId: job.id, eventType: event.event_type, attempt: job.attemptsMade + 1 }, 'processing')

    switch (event.event_type) {
      case 'user.signed_up':
        await onUserSignedUp(event)
        break
      case 'user.deleted':
        await onUserDeleted(event)
        break
      case 'security.brute_force_detected':
        await onBruteForceDetected(event)
        break
      default:
        // Unknown event type. Log and acknowledge. New event types in the future
        // shouldn't fail the worker — they should be picked up by an explicit
        // case once handling lands.
        log.info({ eventType: event.event_type }, 'unhandled event type; skipping')
    }
  },
  { connection: redis, concurrency: 16 },
)

worker.on('failed', (job, err) => {
  log.error({ jobId: job?.id, err: err.message }, 'job failed')
})

async function onUserSignedUp(event: Event) {
  const user = (event.data as { user: { id: string; email: string; name?: string } }).user
  // Post to your CRM / send a welcome email / whatever you do.
  log.info({ userId: user.id, email: user.email }, 'welcomed user')
}

async function onUserDeleted(event: Event) {
  const user = (event.data as { user: { id: string } }).user
  // Tear down the user's downstream records.
  log.info({ userId: user.id }, 'cleaned up user')
}

async function onBruteForceDetected(event: Event) {
  // Alert your incident channel.
  const data = event.data as { target_email_or_user: string; attempt_count: number; ips: string[] }
  log.warn({ target: data.target_email_or_user, count: data.attempt_count }, 'brute force detected')
  // postToSlack(...)
}

Run both:

# terminal 1
REDIS_URL=redis://localhost:6379 INTELLIAUTH_WEBHOOK_SECRET=<your-secret> pnpm ts-node src/receiver.ts

# terminal 2
REDIS_URL=redis://localhost:6379 pnpm ts-node src/worker.ts

The receiver listens; the worker drains. Both can be horizontally scaled independently in production — multiple receivers behind a load balancer, multiple workers consuming the same queue.

Expose your local receiver to IntelliAuth

The receiver is on localhost:4000. IntelliAuth needs to reach it over HTTPS. Use a tunnel:

ngrok http 4000
# or
cloudflared tunnel --url http://localhost:4000

Take the HTTPS URL the tunnel gives you (e.g., https://abc123.ngrok.app).

Create the subscription

In the tenant admin: Authentication → Webhooks → New subscription.

URL — <your-tunnel-url>/webhooks/intelliauth.
Events — user.signed_up, user.deleted, security.brute_force_detected. (Or * to receive everything.)
Description — "Local test handler."

Copy the signing secret into your INTELLIAUTH_WEBHOOK_SECRET env var.

Test it

The console has a "Test event" button. Click it. Both terminals should show activity — receiver acknowledges, worker logs unhandled event type; skipping (the test event type is test.ping).

Now trigger a real event: sign up a new user in the tenant. The user.signed_up event flows through.

Observability

Three things worth wiring before this hits production.

Metrics

The receiver emits these (instrument with Prometheus / StatsD / OpenTelemetry):

webhook.received{event_type} — counter
webhook.signature_failed — counter
webhook.duplicate — counter
webhook.enqueued{event_type} — counter

The worker:

webhook.processed{event_type} — counter
webhook.failed{event_type, attempt} — counter
webhook.processing_seconds{event_type} — histogram
webhook.queue_depth — gauge (read from BullMQ)

Alert on webhook.signature_failed > 0 (spike means signing secret mismatch or attack), webhook.failed{attempt>3} > N/min (workers struggling), webhook.queue_depth growing unbounded.

Structured logs

Every log line has: event_id, event_type, tenant_id, request_id (the one from the IntelliAuth-side delivery). Match across the system to trace one event's full path.

Tracing

OpenTelemetry spans across receiver → queue → worker make incident postmortems faster. The handoff via Redis isn't natively traced; pass the trace context through the BullMQ job's data and reattach on the worker side.

What about the dead-letter queue

After 5 retries, BullMQ marks a job failed and stops retrying. The job stays in the queue's "failed" section (because we set removeOnFail: false). BullMQ exposes a dashboard via bull-board that shows the failed jobs and lets you retry / inspect them manually.

This is the worker-side DLQ. There's a separate dead-letter on the IntelliAuth side, for deliveries that never reached you successfully — see retries and dead-letter.

In production, monitor both:

IntelliAuth's DLQ for your subscription — events that never got delivered (your receiver was down or returning errors).
Your worker's failed-jobs queue — events that were delivered but processing failed.

The first is signature / connectivity; the second is your business logic.

Common production gotchas

The signature secret is on multiple instances. Make sure your secret rollout deploys to every receiver instance before the subscription's secret rotates. The two-secret pattern (verify against either old or new) carries you through.
Replay on cold start. When the worker starts, it processes whatever's in the queue. Make sure your worker code is idempotent for each event type — if a processing crash interrupted a half-applied change, the retry should produce the same final state.
Queue depth blowing up. Workers crash; queue grows; alerts fire; recovery. Run multiple workers so one going down isn't a full halt; alert on queue depth growth rate.
Test events leaking into production. The console's test event flows through the same path as real events. If your test sends a fake user.signed_up, you may end up with a fake user in your CRM. Filter on data.test === true and skip when set.

Where to go from here

Add per-event-type rate limiting in the worker if some events fan out to expensive downstream calls.
Add a deadline per job (BullMQ's lifo: false + explicit timeout) so a stuck job doesn't hold a worker forever.
Build a small admin page that surfaces "events processed today, by type" and "currently failing events" — much faster than digging through logs every time.