Stripe + idempotency: the patterns every SaaS gets wrong

The first time we got paged at 3am about Stripe charging a customer twice, the root cause was three lines of code: a webhook handler that did real work before it had marked the event as seen, then crashed before it could ACK. Stripe retried. We charged twice. Customer support spent the next week refunding.

Idempotency in Stripe (and any webhook-driven system) is one of those topics where 90% of the writeups say “use the idempotency key” and stop. Real production code needs a tighter pattern. Here’s the one we ship.

The retry-safe handler in one diagram

The pattern: dedupe by event ID BEFORE doing real work, using the database’s UNIQUE constraint as the source of truth. Not application code, not Redis, not memory.

-- Once, in your migrations:
CREATE TABLE webhook_events (
  id        TEXT PRIMARY KEY,    -- the Stripe event.id
  received  TIMESTAMPTZ DEFAULT now(),
  payload   JSONB NOT NULL
);

-- In your handler (Postgres dialect):
INSERT INTO webhook_events (id, payload)
VALUES ($1, $2)
ON CONFLICT (id) DO NOTHING
RETURNING id;

If RETURNING idgives you a row, this is the first time you’ve seen the event — do the work. If it’s empty, you’ve seen this event before — ACK 200 and move on. Database does the heavy lifting; your application code stays simple.

The four mistakes that cause double charges

1. Doing work before deduping

Common pattern: receive event → charge the customer → record the event. If the process crashes between step 2 and step 3, Stripe retries, you charge again. Wrong order. Always dedupe first.

2. Treating a 500 response as a retry signal but doing the work anyway

Stripe retries on 5xx and connection errors. If your handler does the work and then returns 500 because of a downstream error, Stripe will hit you again. Make sure that EVERY path that does work writes the dedupe row first. EVERY path.

3. Skipping the same logic for outgoing API calls

Idempotency cuts both ways. When YOU call Stripe (charge a card, create a subscription), pass an Idempotency-Keyheader so retries from YOUR side don’t double-create on Stripe’s side. Generate the key deterministically from your own business event (e.g. `payment-${userId}-${invoiceId}`), not randomly.

4. Relying on application-level locks

“If event.id is in this in-memory set, skip it” doesn’t survive a deploy or a horizontal scale-out. The dedupe state has to be in the database, in the same transaction as the work itself, or you’ll lose the protection right when you most need it.

Edge cases that trip up most implementations

Event ordering.Stripe doesn’t guarantee webhooks arrive in order. Don’t code as if they do. Use event.created timestamps and a state machine.
Event types you don’t handle yet.Always ACK 200, even if you don’t know what to do with the event. Otherwise Stripe will retry indefinitely.
Slow handlers. Stripe times out after 30s. If your work might take longer, ACK fast and enqueue the heavy work to a job queue.
Signature verification. Always verify the Stripe-Signature header before processing. Otherwise anyone with your endpoint URL can mint fake events.

The same pattern works everywhere

This isn’t Stripe-specific. The same “UNIQUE constraint on event_id” pattern works for SQS, EventBridge, Webhooks from any vendor, S3 event notifications, Slack interactions — anything where the producer might deliver the same event twice. Once you internalize it, the production-incident surface drops noticeably.

How we approach this

Every SaaS we ship via our SaaS Product Development service ships with this pattern as the webhook spine on day one. It’s a 5-minute decision that pays back for the lifetime of the product.

Takeaways

Dedupe by event.id BEFORE doing real work.
Use Postgres UNIQUE constraint, not application state.
Use Idempotency-Key on outgoing Stripe calls too.
Always ACK 200 for events you don’t handle.
Verify the Stripe-Signature header. Always.

Stripe + idempotency: the patterns every SaaS gets wrong

The retry-safe handler in one diagram

The four mistakes that cause double charges

1. Doing work before deduping

2. Treating a 500 response as a retry signal but doing the work anyway

3. Skipping the same logic for outgoing API calls

4. Relying on application-level locks

Edge cases that trip up most implementations

The same pattern works everywhere

How we approach this

Takeaways

More from the engine room

AI in QA: where it helps, where it doesn’t

Controlling LLM costs in production

RAG vs fine-tuning: which do you actually need?

Agentic features in SaaS: the maturity ladder

Offline-first mobile: the app that works on the subway

Lift-and-shift vs refactor: how to actually decide

Monolith migration: the strangler-fig playbook

SOC 2 readiness in plain English

Let’s Build the Future Together!