Redundancy Investment Decision
When prioritizing redundancy investments across multiple systems or dependencies, or when making go/no-go decisions on specific redundancy proposals.
A 2x2 matrix for systematically determining redundancy investment levels by mapping systems against failure probability (Low <5%/year vs High >15%/year) and failure cost (Low <$100K vs High >$1M).
When to Use Redundancy Investment Decision
When prioritizing redundancy investments across multiple systems or dependencies, or when making go/no-go decisions on specific redundancy proposals.
How to Apply
Quadrant 1: Strategic Redundancy
Low probability, High cost. Unlikely failures with severe consequences justify moderate investment. Use cost-effective approaches (standby vs. active, insurance, cold backup). Focus on ensuring redundancy works when needed through regular testing.
Questions to Ask
- Is standby redundancy sufficient or is active required?
- Would insurance be more cost-effective?
- How will we ensure backup works after long dormancy?
Quadrant 2: Critical Redundancy
High probability, High cost. Frequent failures with severe consequences demand heavy investment. Active redundancy for instant failover, multiple independent backup layers, diverse approaches to avoid common-mode failures.
Questions to Ask
- Do we have instant failover capability?
- Are our redundant systems truly independent?
- Have we addressed common-mode failure risks?
Quadrant 3: Selective Redundancy
Low probability, Low cost. Unlikely failures with manageable consequences may warrant minimal redundancy (cold backup, documentation, recovery procedures) or simply accepting risk. Insurance may be more cost-effective.
Questions to Ask
- Is documentation and recovery procedures sufficient?
- Should we simply accept this risk?
- Would insurance cover this more efficiently?
Quadrant 4: Efficiency Focus
High probability, Low cost. Frequent failures with low consequences don't justify redundancy. Fix root causes, improve reliability, or accept failures and repair quickly.
Questions to Ask
- Why are failures so frequent?
- Can we fix the root cause instead of adding backup?
- Is it cheaper to just handle failures as they occur?