4 Surprising Truths About IT Resilience I Learned from a Lead Auditor
Does your IT team feel stuck in a frustrating loop, constantly fighting the same fires and dealing with recurring service incidents? This cycle isn't inevitable; it's a sign that your approach to reliability is reactive, not strategic. The ISO/IEC 20000-1 standard introduces the concept of Service Assurance to answer one crucial strategic question: Can this organization consistently deliver services as promised—under normal conditions and during disruption? From a lead auditor’s perspective, the answer lies in four counter-intuitive truths that separate truly resilient organizations from the rest.
--------------------------------------------------------------------------------
1. Your Recurring Incidents Aren't Bad Luck—They're a Capacity Failure
When a critical service slows down or fails repeatedly, teams often blame operational errors or treat each event as a separate incident. But from an auditor’s perspective, this is a classic symptom of failed capacity management. The organization isn't proactively planning for the current and future demand for essential resources—not just compute, storage, and network, but also tooling, licensing, and human capacity. What an auditor immediately looks for is evidence that demand trends are being analyzed and that capacity is managed to prevent performance issues before they become incidents.
Repeated performance incidents often signal failed capacity management, not operational error.
This insight shifts the focus from blaming individuals for operational mistakes to fixing the underlying, systemic failure to plan and provision resources effectively.
--------------------------------------------------------------------------------
2. An Untested Recovery Plan Is Just an Expensive Document
Service Continuity Management ensures services can be recovered after a major disruption using plans, defined recovery objectives (RTO/RPO), and business impact analysis. Yet many organizations invest heavily in creating these plans only to let them sit on a shelf. For an auditor, an untested plan is a "Major Audit Red Flag" because it provides a completely false sense of security.
An auditor's focus here is on effectiveness. Without regular testing, there is no evidence that the plan actually works, that people understand their roles, or that recovery objectives can be met. They will demand to see proof not only that plans were tested, but that the test results were used to make the plans better. Evidence of iterative improvement is the only thing that transforms a continuity plan from a theoretical document into a reliable safety net.
--------------------------------------------------------------------------------
3. A Service Isn't Truly "Available" If It's Not Secure
Traditionally, IT measures availability with uptime metrics, while security is handled by a separate team with separate goals. This silo creates a dangerous blind spot. As defined in Clause 8.7 of the ISO 20000-1 standard, information security is an inseparable component of service assurance. The focus shifts from merely protecting data to actively protecting services and ensuring their secure availability and continuity. A service must not only be accessible but must also preserve confidentiality and integrity.
A service that is “available but insecure” is not assured.
This is critical because a service compromised by a security breach is, for all practical purposes, unavailable to users who need to trust it. True resilience means being both operational and secure.
--------------------------------------------------------------------------------
4. Your Biggest Risk Might Be Your Own Organizational Chart
Many IT organizations have separate teams for availability, capacity, continuity, and security. While each team may be an expert in its domain, this siloed structure is often the root cause of service fragility. Your capacity failures, untested plans, and security gaps are often symptoms of this one core problem: your organizational chart prevents your teams from seeing the whole picture.
Clause 8.7 of the standard requires these assurance controls to be integrated. An auditor sees it as a "Major Audit Red Flag" when "each assurance area is managed independently with no coordination." To expose this risk, auditors perform an end-to-end service audit: they select a critical service and trace its entire lifecycle—from its SLA targets for availability and security, through the capacity and continuity controls meant to uphold them, to the actual performance outcomes. This technique instantly reveals the gaps created by internal silos. True service assurance comes from integrated systems, not just hardworking but disconnected teams.
--------------------------------------------------------------------------------
Conclusion: Are You Managing Incidents or Engineering Resilience?
Genuine service assurance is a proactive, integrated discipline that goes far beyond reactive incident management. The recurring incidents, untested plans, and security gaps aren't four separate issues; they are four facets of a single, integrated assurance model. Shifting your focus from fighting fires to engineering resilience builds services your organization and its customers can truly trust.
Are you managing incidents, or are you engineering resilience?
Ready to take the next step?
Browse our 221 toolkits and services, or speak to a lead auditor about certification, gap analysis, internal audit or training.
Share This Article
Found this useful? Share it with your network:
