High availability infrastructure: patterns, trade-offs and what to actually build.

High availability is not a product you buy. It is a set of architectural decisions that each add capability and cost. The right design depends on what you are protecting against and what an hour of downtime actually costs your business.

Preguntas frecuentes

Is 99.99% uptime really achievable with a single cloud provider?

Yes, as long as the design places components across multiple availability zones within the region. 99.99% (52 minutes downtime per year) is comfortably within reach on a well-architected single-region, multi-AZ setup. Going to 99.999% generally requires multi-region, which roughly doubles the operational cost.

How do we measure our actual uptime vs. the target?

External synthetic monitoring from multiple geographic locations — never measure your uptime from inside your own infrastructure. We run minute-granularity checks against the user-facing endpoints, log every failure, and publish a transparent monthly uptime report per client environment.

What is the biggest single cause of failed failover in practice?

Untested capacity headroom. The secondary node was configured correctly, replication was healthy, but when it took over the load at 9am Monday, it was undersized. Every failover plan needs a documented N+1 capacity check that gets revalidated every quarter.

Do we need multi-region for HA?

Almost never for mid-market platforms. Multi-AZ within a single region protects against data-centre-level failure, which is the realistic failure mode. Multi-region protects against regional cloud-provider outages, which are rare enough that the ongoing operational cost usually exceeds the expected benefit. Regulatory or latency requirements are the two cases where multi-region is genuinely needed.

Designing or fixing an HA setup?

We run 99.99%-class infrastructure for European businesses. Tell us about your target and we will map out what it takes.

Talk to an engineer