Dezen Technology
All articles
EngineeringMar 10, 20268 min read

Production-ready in 25 items

Five categories of five items. If you can't tick every box, you're not production-ready — you're demo-ready. The exact checklist we run before any launch.

Production-ready in 25 items

“Production-ready” is one of the most overloaded phrases in software. Engineering teams mean “it works on staging.” Product means “we can demo it.” Customers mean “it works at 2am on a Sunday when we need it.” Only one of those definitions matters.

Here’s the 25-item checklist we run through before any launch we put our name on — five categories of five items each. If you can’t tick every box, you’re not production-ready. You’re demo-ready.

Five categories with five items each — Reliability, Observability, Security, Performance, Operations

Reliability

  1. SLOs published.A specific number, not “high uptime.” 99.9% availability is a 43-minute monthly budget. Pick yours, write it down.
  2. Auto-scaling configured.Both horizontal and vertical. Tested by actually generating load — not just assumed because the platform says it can.
  3. Health checks at every layer.Load balancer → service → dependencies. Health checks return real signals, not just “process is alive.”
  4. Idempotent retries. Every external call (especially webhooks + payment) safe to fire twice. Tested on staging by deliberately retrying.
  5. Backups + restore drill. The first time you try to restore a backup must not be during an incident. Drill it quarterly.

Observability

  1. Structured logs.JSON, tagged with request ID, user ID, tenant ID. Searchable in one tool (Datadog, Loki, CloudWatch — pick one).
  2. Metrics with real signal.Request rate, error rate, latency (p50/p95/p99). Not just CPU/RAM — those tell you nothing about user experience.
  3. Distributed traces.Cross-service traces with OpenTelemetry. When something is slow, you can answer “which call?” in seconds.
  4. Alerting on burn rate, not point-in-time spikes. Burn-rate alerts fire when your error budget is being consumed faster than expected, not on every two-minute spike.
  5. On-call rota. A real human gets paged. The rota is documented, rotated weekly, and includes a backup.

Security

  1. Authentication + RBAC. Real identity (not shared accounts), role-based authorization, audit-logged.
  2. Secrets management. Secrets in a real vault (AWS Secrets Manager, 1Password, Vault). Never in env files committed to git.
  3. Encryption everywhere. TLS for in-transit, AES-256 for at-rest. Both turned on, both tested.
  4. Penetration tested. By a third party, with a written report. Findings remediated or formally accepted with a date.
  5. Audit logs.Who-did-what, immutable, retained per your industry’s requirements (90 days to 7 years).

Performance

  1. p95 latency budget.< 200ms for the API, < 2s for first paint on the web. Measured continuously.
  2. CDN in front of static assets.CloudFront, Cloudflare, Fastly — anything. Images, JS, CSS not coming off your origin.
  3. Caching strategy documented. Application cache, CDN cache, DB cache. Each layer with explicit invalidation rules.
  4. Database indexes for hot paths. Slow query log running; the top 5 queries indexed.
  5. Bundle budget.The web app’s first JS payload < 200KB. CI fails if a PR pushes it over.

Operations

  1. CI/CD with automated tests. Every merge runs tests + deploys to staging. Production deploy is one button or one approval, never a manual sequence.
  2. Infrastructure as code. Terraform, Pulumi or CloudFormation. The whole stack can be recreated from a Git commit.
  3. Feature flags. Ship dark, roll out gradually, kill fast. Day-one investment, lifelong payback.
  4. Runbooks for the top 10 incidents. Database down, third-party API down, payment processor down. Five-minute response, not a thirty-minute scramble.
  5. Public status page.Customers find out about incidents from you, not from each other. Statuspage / Atlassian / Better Uptime — cheap, fast, professional.

How to use the list

Print the checklist. Walk the team through it 4 weeks before launch. Anything red on the list either ships fixed, ships with a documented mitigation, or doesn’t ship. No “we’ll fix it in the first sprint after launch” — those items are still red on the list a quarter later.

We bake this into our Ongoing Maintenance retainer so the checklist stays green months after launch — not just on day one.

Takeaways

  • “Production-ready” needs a written definition or it’s meaningless.
  • Five categories: reliability, observability, security, performance, operations.
  • Backup + restore drill matters more than backup.
  • Health checks at every layer beat one big “is the app alive” check.
  • If you can’t tick every item, write down the mitigation. Don’t pretend.
Keep reading

More from the engine room

AI in QA: where it helps, where it doesn’t

May 27, 2026

AI in QA: where it helps, where it doesn’t

AI augments QA throughput — test generation, triage, visual regression. It doesn’t replace QA judgment: strategy, exploratory testing, and defining correctness stay human.

Read More
Controlling LLM costs in production

May 25, 2026

Controlling LLM costs in production

Four levers cut spend 10x without cutting quality: route by difficulty, cache, trim context, batch and stream. Measure cost-per-feature first; set budget guardrails always.

Read More
RAG vs fine-tuning: which do you actually need?

May 23, 2026

RAG vs fine-tuning: which do you actually need?

Facts → RAG. Behavior → maybe fine-tune. Most business AI features want RAG even when teams ask for fine-tuning. The decision rule and the order to try things in.

Read More
Agentic features in SaaS: the maturity ladder

May 21, 2026

Agentic features in SaaS: the maturity ladder

From manual to autonomous — four levels of autonomy and the guardrails each needs. Match autonomy to the cost of being wrong, not to how impressive it sounds.

Read More
Offline-first mobile: the app that works on the subway

May 19, 2026

Offline-first mobile: the app that works on the subway

The UI never waits on the network. Local DB, sync engine, server — with conflict resolution per data type. The architecture that makes mobile apps feel instant.

Read More
Lift-and-shift vs refactor: how to actually decide

May 17, 2026

Lift-and-shift vs refactor: how to actually decide

Lift-and-shift is fast, cheap to do, expensive to keep. Refactor is months of work with structural upside. The matrix — and why half-finished refactors are the worst path.

Read More
Monolith migration: the strangler-fig playbook

May 15, 2026

Monolith migration: the strangler-fig playbook

The big-bang rewrite is the most consistently bad idea in software. Proxy in front, extract one route at a time, shrink the monolith to nothing. No migration day.

Read More
SOC 2 readiness in plain English

May 13, 2026

SOC 2 readiness in plain English

Five Trust Service Criteria, Security mandatory and the rest optional. Type 1 vs Type 2. The pragmatic 6-month timeline — not the year-long ordeal it’s made out to be.

Read More

Let’s Build the Future Together!

Contact our team today and turn your ideas into reality.

Let’s Discuss
Contact Details : sales@dezentech.com Sy. No:40, Flat No:402, SIRISAMPADHA ARCADE I, Plot no:18-21, behind Union Bank of India, Khajaguda, Hyderabad, Telangana 500104