The Disaster Matrix: Better Safe Than Sorry

The most important thing I learned as an ISO 27001 auditor: Disasters arrive unplanned, but can be met with a plan. A practical guide to risk matrices and drill culture.

January 9, 2026

6 min read

The Disaster Matrix: Better Safe Than Sorry

As an ISO 27001 Lead Auditor, I've reviewed dozens of organizations' security and continuity practices. The most consistent finding? Almost everyone underestimates disaster probability while simultaneously being unprepared for the impact.

There's a simple tool that fixes this: the risk matrix. And there's a practice that validates it: the disaster drill. Together, they transform "hoping nothing bad happens" into a structured, testable plan.

Risk = Probability × Impact

The fundamental equation of risk management is deceptively simple:

Risk = Probability × Impact

A meteor strike has catastrophic impact but negligible probability. A password being guessed has low impact (usually) but higher probability. Both might produce similar risk scores, but they require completely different responses.

The risk matrix visualizes this by plotting probability on one axis and impact on the other:

	Low Impact	Medium Impact	High Impact
High Probability	Medium	High	Critical
Medium Probability	Low	Medium	High
Low Probability	Low	Low	Medium

The magic happens when you populate this matrix with your actual scenarios. Suddenly, vague anxieties become prioritized action items.

Building Your Risk Matrix

Here's a practical example for an e-commerce operation:

Scenario	Probability	Impact	Risk Level	Mitigation
Database server failure	Medium	Critical	High	Active-passive cluster
DDoS attack	High	High	Critical	CDN + DDoS protection
Developer pushes bug to production	High	Medium	High	Staging environment, code review
Datacenter fire	Low	Critical	Medium	Offsite backups, DR site
Payment provider outage	Medium	High	High	Secondary payment processor
Core developer quits	Medium	High	High	Documentation, knowledge sharing

Notice that some high-impact events (datacenter fire) rank lower than frequent events with moderate impact (production bugs). This is intentional—you have limited resources, and the matrix helps you allocate them where they matter most.

RTO and RPO: The Recovery Metrics

Two metrics define your disaster recovery requirements:

RTO (Recovery Time Objective): How long can you be down? If your RTO is 4 hours, your systems must be restorable within 4 hours of an incident.

RPO (Recovery Point Objective): How much data can you lose? If your RPO is 1 hour, you need backups at least hourly. An RPO of zero requires real-time replication.

These metrics should be defined by business requirements, not technical convenience:

System	RTO	RPO	Implication
E-commerce storefront	1 hour	15 minutes	Hot standby, frequent DB replication
Email server	4 hours	24 hours	Daily backups sufficient
Analytics platform	24 hours	1 week	Weekly backups, cold restore acceptable
Customer database	1 hour	0	Real-time replication mandatory

Don't set RTO/RPO based on what you currently have. Set them based on what the business needs, then build infrastructure to meet those requirements.

The Three Types of Drills

Having a disaster recovery plan is necessary but not sufficient. Untested plans fail. There are three levels of testing:

1. Tabletop Exercise

What: Team gathers in a room (or video call). Facilitator presents a scenario: "It's 3 AM, the database server is unresponsive, and the on-call engineer can't SSH in. What do you do?"

Duration: 1-2 hours

Frequency: Quarterly

Value: Reveals gaps in documentation, unclear responsibilities, and missing contact information. Low cost, no production impact.

2. Simulation Drill

What: Partial execution without affecting production. Restore a backup to a test server and verify the data. Failover to the DR network and confirm connectivity. Test the notification chain without actually paging everyone.

Duration: 4-8 hours

Frequency: Semi-annually

Value: Validates that procedures actually work, not just that they exist on paper. Moderate cost, minimal production impact.

3. Live Failover

What: Actually fail over to DR systems during a maintenance window. Real traffic hits the backup infrastructure. Then fail back.

Duration: 8-24 hours

Frequency: Annually

Value: Proves true recoverability. There is no substitute. High cost, real (controlled) production impact.

The Drill Checklist

Before any drill, verify:

All documentation is accessible (not just on the server that "failed")
Contact lists are current (people change roles, phone numbers change)
Credentials are available and working (including DR site access)
Backup integrity is confirmed (not just that backups exist, but that they restore)
Communication channels are established (how do you coordinate if email is down?)

After every drill, document:

The post-drill review is where the real value lives. A drill without follow-up is just theater.

The Cultural Challenge

The hardest part of disaster preparedness isn't technical—it's cultural. Drills cost time. They interrupt "real work." They surface uncomfortable truths about gaps and failures.

Management support is essential. If leadership treats drills as optional overhead, staff will deprioritize them. If leadership participates and takes findings seriously, the organization builds genuine resilience.

I've seen companies where the CEO joins annual DR drills—not to supervise, but to understand what happens when systems fail. Those companies recover faster from real incidents.

"Better Safe Than Sorry"

The English phrase captures it perfectly. The Turkish equivalent—"denize düşen yılana sarılır" (one who falls in the sea grasps even a snake)—describes what happens when you don't prepare: desperate improvisation.

Planning beats improvisation every time. Not because plans survive contact with reality unchanged—they don't—but because the act of planning builds the mental models and muscle memory needed to adapt when reality diverges.

A disaster will happen. The only question is whether you'll meet it with a tested playbook or frantic googling at 3 AM.

Choose the playbook.

...