Skip to content

Build a production-grade webhook handler

A toy webhook receiver is ten lines. A production one is a small system: signature verification, idempotency, asynchronous processing, retry handling, observability. This tutorial walks the whole thing — start from "blank Node project", end at "I'd run this in production." Each piece is short; the assembly is what matters.

The shape we're building:

  • HTTPS endpoint receives the POST.
  • Validates HMAC + anti-replay timestamp.
  • Dedupes by event_id.
  • Enqueues onto a Redis-backed queue (BullMQ).
  • A worker drains the queue, processes events, with its own retry on failure.
  • Metrics + structured logs throughout.

We use Node + Express + BullMQ. The shape transfers to any stack.

Before you begin
  • Node 18+, pnpm, a Redis instance (Docker is fine: docker run -p 6379:6379 redis:7)
  • An IntelliAuth tenant + admin permissions to create a webhook subscription
  • A test public hostname (ngrok, Cloudflare Tunnel, or similar)
Terminal window
mkdir webhook-handler && cd webhook-handler
pnpm init -y
pnpm add express bullmq ioredis pino
pnpm add -D typescript @types/express @types/node ts-node
{
"compilerOptions": {
"target": "es2022",
"module": "node16",
"moduleResolution": "node16",
"strict": true,
"esModuleInterop": true,
"outDir": "dist"
}
}

src/receiver.ts:

import express from 'express'
import { createHmac, timingSafeEqual } from 'node:crypto'
import { Queue } from 'bullmq'
import Redis from 'ioredis'
import pino from 'pino'
const log = pino({ name: 'webhook-receiver' })
const redis = new Redis(process.env.REDIS_URL!, { maxRetriesPerRequest: null })
const queue = new Queue('intelliauth-events', { connection: redis })
const SECRET = process.env.INTELLIAUTH_WEBHOOK_SECRET!
const REPLAY_WINDOW_S = 5 * 60
function verify(rawBody: Buffer, signature: string, timestamp: string): boolean {
const ts = Number(timestamp)
if (!Number.isFinite(ts)) return false
const now = Math.floor(Date.now() / 1000)
if (Math.abs(now - ts) > REPLAY_WINDOW_S) return false
const expected = createHmac('sha256', SECRET).update(rawBody).digest('hex')
if (signature.length !== expected.length) return false
return timingSafeEqual(Buffer.from(signature), Buffer.from(expected))
}
const app = express()
app.post(
'/webhooks/intelliauth',
express.raw({ type: 'application/json', limit: '1mb' }),
async (req, res) => {
const signature = req.header('X-IntelliAuth-Signature') ?? ''
const timestamp = req.header('X-IntelliAuth-Timestamp') ?? ''
const eventId = req.header('X-IntelliAuth-Event-Id') ?? ''
const body = req.body as Buffer
if (!verify(body, signature, timestamp)) {
log.warn({ eventId }, 'signature verification failed')
return res.status(401).json({ error: 'invalid_signature' })
}
// Idempotency — single insert, dedup by event_id.
const inserted = await redis.set(`dedupe:${eventId}`, '1', 'EX', 14 * 24 * 3600, 'NX')
if (inserted === null) {
log.info({ eventId }, 'duplicate; skipping')
return res.status(200).end()
}
const event = JSON.parse(body.toString('utf8'))
await queue.add(event.event_type, event, {
jobId: eventId, // BullMQ dedupes too
removeOnComplete: { count: 1000 }, // keep last 1000 for debugging
removeOnFail: false, // keep failed jobs for inspection
attempts: 5,
backoff: { type: 'exponential', delay: 5_000 },
})
log.info({ eventId, eventType: event.event_type }, 'enqueued')
res.status(202).end()
},
)
const port = Number(process.env.PORT ?? 4000)
app.listen(port, () => log.info({ port }, 'receiver listening'))

A few choices worth calling out:

  • express.raw — get the body as bytes, BEFORE any JSON parsing. Required for the signature to verify.
  • redis.set ... NX — atomic dedup. Two simultaneous deliveries of the same event_id race here; one wins.
  • res.status(202) — explicit "accepted for processing" rather than 200. Cosmetic but documents intent.
  • removeOnFail: false — failed jobs stay in BullMQ for inspection rather than disappearing.

src/worker.ts:

import { Worker, Job } from 'bullmq'
import Redis from 'ioredis'
import pino from 'pino'
const log = pino({ name: 'webhook-worker' })
const redis = new Redis(process.env.REDIS_URL!, { maxRetriesPerRequest: null })
interface Event {
event_id: string
event_type: string
occurred_at: string
tenant: { id: string; slug: string }
data: Record<string, unknown>
delivery?: { attempt: number }
}
const worker = new Worker<Event>(
'intelliauth-events',
async (job: Job<Event>) => {
const event = job.data
log.info({ jobId: job.id, eventType: event.event_type, attempt: job.attemptsMade + 1 }, 'processing')
switch (event.event_type) {
case 'user.signed_up':
await onUserSignedUp(event)
break
case 'user.deleted':
await onUserDeleted(event)
break
case 'security.brute_force_detected':
await onBruteForceDetected(event)
break
default:
// Unknown event type. Log and acknowledge. New event types in the future
// shouldn't fail the worker — they should be picked up by an explicit
// case once handling lands.
log.info({ eventType: event.event_type }, 'unhandled event type; skipping')
}
},
{ connection: redis, concurrency: 16 },
)
worker.on('failed', (job, err) => {
log.error({ jobId: job?.id, err: err.message }, 'job failed')
})
async function onUserSignedUp(event: Event) {
const user = (event.data as { user: { id: string; email: string; name?: string } }).user
// Post to your CRM / send a welcome email / whatever you do.
log.info({ userId: user.id, email: user.email }, 'welcomed user')
}
async function onUserDeleted(event: Event) {
const user = (event.data as { user: { id: string } }).user
// Tear down the user's downstream records.
log.info({ userId: user.id }, 'cleaned up user')
}
async function onBruteForceDetected(event: Event) {
// Alert your incident channel.
const data = event.data as { target_email_or_user: string; attempt_count: number; ips: string[] }
log.warn({ target: data.target_email_or_user, count: data.attempt_count }, 'brute force detected')
// postToSlack(...)
}

Run both:

Terminal window
# terminal 1
REDIS_URL=redis://localhost:6379 INTELLIAUTH_WEBHOOK_SECRET=<your-secret> pnpm ts-node src/receiver.ts
# terminal 2
REDIS_URL=redis://localhost:6379 pnpm ts-node src/worker.ts

The receiver listens; the worker drains. Both can be horizontally scaled independently in production — multiple receivers behind a load balancer, multiple workers consuming the same queue.

The receiver is on localhost:4000. IntelliAuth needs to reach it over HTTPS. Use a tunnel:

Terminal window
ngrok http 4000
# or
cloudflared tunnel --url http://localhost:4000

Take the HTTPS URL the tunnel gives you (e.g., https://abc123.ngrok.app).

In the tenant admin: Authentication → Webhooks → New subscription.

  • URL<your-tunnel-url>/webhooks/intelliauth.
  • Eventsuser.signed_up, user.deleted, security.brute_force_detected. (Or * to receive everything.)
  • Description — "Local test handler."

Copy the signing secret into your INTELLIAUTH_WEBHOOK_SECRET env var.

The console has a "Test event" button. Click it. Both terminals should show activity — receiver acknowledges, worker logs unhandled event type; skipping (the test event type is test.ping).

Now trigger a real event: sign up a new user in the tenant. The user.signed_up event flows through.

Three things worth wiring before this hits production.

The receiver emits these (instrument with Prometheus / StatsD / OpenTelemetry):

  • webhook.received{event_type} — counter
  • webhook.signature_failed — counter
  • webhook.duplicate — counter
  • webhook.enqueued{event_type} — counter

The worker:

  • webhook.processed{event_type} — counter
  • webhook.failed{event_type, attempt} — counter
  • webhook.processing_seconds{event_type} — histogram
  • webhook.queue_depth — gauge (read from BullMQ)

Alert on webhook.signature_failed > 0 (spike means signing secret mismatch or attack), webhook.failed{attempt>3} > N/min (workers struggling), webhook.queue_depth growing unbounded.

Every log line has: event_id, event_type, tenant_id, request_id (the one from the IntelliAuth-side delivery). Match across the system to trace one event's full path.

OpenTelemetry spans across receiver → queue → worker make incident postmortems faster. The handoff via Redis isn't natively traced; pass the trace context through the BullMQ job's data and reattach on the worker side.

After 5 retries, BullMQ marks a job failed and stops retrying. The job stays in the queue's "failed" section (because we set removeOnFail: false). BullMQ exposes a dashboard via bull-board that shows the failed jobs and lets you retry / inspect them manually.

This is the worker-side DLQ. There's a separate dead-letter on the IntelliAuth side, for deliveries that never reached you successfully — see retries and dead-letter.

In production, monitor both:

  • IntelliAuth's DLQ for your subscription — events that never got delivered (your receiver was down or returning errors).
  • Your worker's failed-jobs queue — events that were delivered but processing failed.

The first is signature / connectivity; the second is your business logic.

  • The signature secret is on multiple instances. Make sure your secret rollout deploys to every receiver instance before the subscription's secret rotates. The two-secret pattern (verify against either old or new) carries you through.
  • Replay on cold start. When the worker starts, it processes whatever's in the queue. Make sure your worker code is idempotent for each event type — if a processing crash interrupted a half-applied change, the retry should produce the same final state.
  • Queue depth blowing up. Workers crash; queue grows; alerts fire; recovery. Run multiple workers so one going down isn't a full halt; alert on queue depth growth rate.
  • Test events leaking into production. The console's test event flows through the same path as real events. If your test sends a fake user.signed_up, you may end up with a fake user in your CRM. Filter on data.test === true and skip when set.
  • Add per-event-type rate limiting in the worker if some events fan out to expensive downstream calls.
  • Add a deadline per job (BullMQ's lifo: false + explicit timeout) so a stuck job doesn't hold a worker forever.
  • Build a small admin page that surfaces "events processed today, by type" and "currently failing events" — much faster than digging through logs every time.