Sooner or later, every website goes down. The question is not whether it will happen but how badly it hurts when it does — and that depends almost entirely on what you put in place beforehand and how you behave afterwards. This guide covers the three things that decide the outcome: knowing why sites fail, limiting the damage, and communicating honestly when the worst happens.

What downtime is

Website downtime is any period when your site is unreachable or not working as intended for visitors. It ranges from a total outage (nothing loads) to partial failures (checkout breaks, pages crawl, images vanish).

Downtime costs money directly through lost sales and indirectly through lost trust. A visitor who hits an error once may forgive you; one who hits it repeatedly tends to leave for good. That is why reliability is a feature, not a luxury — a point worth weighing when you choose a web hosting provider in the first place.

Why sites go down

Most outages trace back to a short list of causes:

  1. Traffic spikes. A burst of visitors — a viral post, a campaign, a busy sale — can overwhelm a server that was sized for normal load.
  2. Code and configuration changes. A faulty deployment or a single mistyped setting is one of the most common triggers. Many outages are self-inflicted.
  3. Hosting or hardware failure. Servers, disks and data-centre power do fail. How fast you recover depends on your host and your backups.
  4. Network and DNS problems. If the system that translates your domain name into a server address misbehaves, the site is effectively gone even though the server is fine.
  5. Expired domains or certificates. A lapsed domain registration or an out-of-date security certificate can take a site offline or scare visitors away. Both are avoidable with calendar reminders.
  6. Cyber-attacks. A denial-of-service attack floods a site with fake traffic to knock it over. Other intrusions can force you to take the site down deliberately.

Knowing the categories helps because each has a different defence.

Preventing and limiting downtime

You cannot eliminate downtime, but you can shrink how often it happens and how long it lasts. Four practices do most of the work.

Monitoring

The first rule of downtime is simple: you should never learn your site is down from an angry customer.

An uptime monitor checks your site from outside, every minute or so, and alerts you the instant it stops responding. Better tools check from several locations and test key journeys (such as loading the checkout), not just the home page. Monitoring turns a slow, embarrassing discovery into a fast, controlled response.

Backups

Backups are your safety net when prevention fails. Three rules make them trustworthy:

  • Recent: back up often enough that losing everything since the last one would not be catastrophic.
  • Off-site: keep copies separate from the live server, so one failure does not destroy both.
  • Tested: restore from a backup periodically. An untested backup is only a hope, not a plan.

Sensible change management

Since so many outages come from changes, treat changes carefully: test in a staging environment first, deploy during quieter periods where possible, and keep a way to roll back quickly. A small amount of discipline here prevents a large share of incidents.

A status page

A status page lives separately from your main site (so it stays up even when your site does not) and shows the current operational state. During an incident it becomes the single, calm source of truth — far better than leaving customers to guess or flood your inbox.

Writing an honest post-mortem

When an outage is over, how you explain it shapes whether customers trust you again. A vague "we experienced technical difficulties, sorry for any inconvenience" satisfies nobody. A good post-mortem does five things:

  1. States what happened, plainly. What broke, and what visitors experienced.
  2. Gives a timeline. When it started, when you noticed, when it was fixed.
  3. Owns the impact. Who was affected and how. Resist the urge to downplay it.
  4. Explains the root cause. What actually went wrong, in honest terms.
  5. Commits to specific fixes. The concrete steps you will take so it does not recur.

The instinct after an outage is to say as little as possible. Usually that is the wrong call. Transparency, done well, tends to build trust rather than erode it, because it shows competence and respect.

A useful real-world model is the way some businesses publish their incident notes openly. London consultancy CM Beyer, for example, published a candid account of a site outage and an apology rather than letting it pass in silence — the kind of plain, accountable write-up that reassures customers far more than a generic notice would. The lesson for any site owner is that admitting a problem clearly is a strength, an idea explored further in our piece on the value of admitting mistakes in business.

A quick incident checklist

  • Detect: automated monitoring alerts you first.
  • Communicate: post to your status page early, even before you have the full picture.
  • Diagnose and fix: find the root cause, restore from backup if needed, verify the fix.
  • Follow up: publish an honest post-mortem and make the changes you promised.

Run that loop calmly and even a serious outage becomes a story about competence rather than chaos.

The bottom line

Downtime is inevitable; lasting damage is not. Understand the handful of causes, put monitoring, tested backups and careful change management in place, keep a status page ready, and — when something does break — explain it honestly. Customers rarely abandon a business for having a bad day. They abandon one that hides it.