Tags All Entries

Incident Response

Incidents are unplanned investments; their costs have already been incurred. Your org’s challenge is to get ROI on those events.

– John Allspaw1

Incidents require learning in order to prevent them in the future. However, simply following a template and a process is not going to magically cause a learning. We must examine why we are doing an incident response plan as John Allspaw notes1. This leads to an interesting thought of when do the costs of an incident justify the need for engineers to expend large amounts of time learning from them? At what scale? Certainly at FAANG level it makes sense. But if an outage of an hour causes you only a few thousand dollars in lost revenue, is it worth many thousands of dollars for a group of engineers to meet and try to learn about an incident which isn’t likely to occur again (assuming that the RCA is fixed).


References

1.
John Allspaw. 📌 Incidents are unplanned investments; their costs have already been incurred. Your org’s challenge is to get ROI on those events. Right now, in most companies, this ROI is left sitting in the dark because of the ``template-driven’’ approaches and ``action item’’ myopia. @allspaw Tweet at https://twitter.com/allspaw/status/1051252775311613952 (2018).

Links to this note