The Impact Of Partner Outages and Changing the Downtime Conversation

Balancing the need to update tools to keep them effective against outages that create downtime.

Cybersecurity In A Bubble

In cybersecurity, three pillars matter most: confidentiality, integrity, and availability. While outages may not compromise data confidentiality or integrity, the resulting loss of availability can be just as damaging as a security incident. 

When systems go offline, business operations grind to a halt, and organizations that are unprepared often face significant operational and financial impact until service is restored.

Outages over the past year have brought this conversation to the forefront, and examining them not only shines a light on a current “problem,” but offers perspective. Why are these outages happening? What do they mean for the companies affected? And is there ever a time when they are justified?

In the security industry, attitudes towards availability have shifted. Convincing IT teams to take systems offline during the containment step of incident response (IR) used to be an uphill battle. Priorities lay with keeping systems online while remediation took place, risking lateral movement while persistence mechanisms were removed and the initial entry vector patched.

This attitude has shifted, driven by the ever-shrinking dwell time needed to achieve threat actor objectives and an increase in general awareness on security impacts. Organizations are playing it safe with containment, partially impacting their own availability to reduce the risk of a more significant operational impact downstream. T

his attitude of resilience also benefits organizations when availability is impacted by a vendor outage.

Outages at Major Cloud & Software Providers

Downtime happens to all, even the biggest and the best. Software providers from Azure to Cloudflare to CrowdStrike all have experienced outages, and all on what is starting to seem like a fairly regular basis. 

  • A networking bug in DNS caused a major outage in October for Azure, with over 30,000 outage reports within the first hour. The cause? An inadvertent configuration change in the Azure Front Door (AFD) global routing service.
  • A permission misconfiguration to a bot management threat database was to blame for Cloudflare’s November outage, lasting a few hours and causing worldwide ripples. In December a short outage was a consequence of detecting and mitigating the React2Shell emerging threat as part of an emergency security measures to secure customers against the actively exploited vulnerability.
  • A faulty security content update to CrowdStrike’s Falcon security EDR underpinned the July 2024 outage, affecting over eight million systems worldwide. While the crash was devastating, CrowdStrike is still trusted to protect endpoints as a leading EDR provider and their public response has been praised by some.

These instances can occur hundreds of times per year per provider and mostly go unnoticed. Unfortunately, when they do make headlines, it’s because the outage has spread to thousands; a problem of scope germane to doing business with big tech companies.

It takes an incredible amount of dedicated resources to support a network of thousands of cloud-dependent companies one hundred percent of the time. Still, most cloud providers promise 99.99 percent uptime in their SLAs; and for the most part, deliver. 

The examples above are a selection of a few high-profile outages in 2025, but many share a similar theme. Bar the Azure outage, they are caused by mistakes during updates to security content and configurations, and this is widely seen as an unfortunate consequence to a necessary process. Security content needs to be frequently updated to keep the tooling effective. 

With some EDRs hitting around 2,000 content updates a year, the consequences of not updating could mean compromise. This tradeoff is understood, and much like attitudes to containment in IR, cyber resilience accepts a reasonable impact in availability in order to preserve confidentiality and integrity. 

Should vendors do better at QA and staging security content updates before being released to production? Yes. Is the occasional outage an acceptable consequence of rapid content development and release cycles? Also, yes.

The appetite for availability impacts seems to have been growing, but we are unlikely to see the same level of acceptance for consistent outages repeated by the same source. 

Cyberattacks vs. Outages: Comparing Impact

Downtime is undeniably challenging for organizations, and all teams should plan for it. However, outages are often the lesser of two evils. While they disrupt availability, they do not inherently compromise confidentiality or integrity which, unlike cyberattacks, can damage all three.

First, restoring systems after a controlled shutdown is typically faster and more reliable than recovering lost or stolen data. Ransomware has evolved beyond simple encryption in exchange for a decryption key. Even as organizations improve backup and recovery, attackers now rely on data theft and extortion, making recovery far more complex.

Second, once data is exfiltrated, it’s rarely fully recoverable, even if a ransom is paid. Lost data as a result undermines business operations, erodes customer trust, triggers compliance penalties, and may fuel further criminal activity if sold or reused.

Finally, incident response practices reinforce this reality. In most cases, availability is intentionally deprioritized during containment to protect confidentiality and integrity. Downtime is accepted until systems can be safely restored because security, ultimately, outweighs uptime.

Outages do impact availability, but for the most part they stop there. That said, organizations can benefit by taking the same risk-aware approach to developing IR playbooks to outage readiness planning. This is why many organizations evaluate the odds and even favor outages, if weighed against the right alternative. And the only alternative that can outweigh downtime is security. 

Lessons Learned from Major Outages

Outages are never convenient for businesses, but they need to be compared against their alternatives. In some sense, this may be the price of reaping the benefits of 99.99 percent availability 364 days of the year, and of leveraging big cloud infrastructure that otherwise would be unavailable (or unattainable) to millions of businesses worldwide.

It’s a complex issue. Is there an overreliance on big tech companies? Possibly. Or perhaps it's just a necessary reliance as organizations of all sizes look for affordable ways to scale. 

Either way, CSPs are operating on the same logic as most of their customers: an outage beats a full-blown compromise - always. We are able to see it on small scales via incident response playbooks and on greater ones such as the Cloudflare and CrowdStrike outages. 

These incidents were not caused by external attackers, but by well-intentioned efforts to protect customers - specifically, the rapid deployment of security updates. While execution can always be scrutinized, the objective remains clear: preserve confidentiality and integrity, even at the cost of short-term availability.

And that trade-off, protecting data rather than risking all three pillars of security, is one most organizations will accept any day.

More in Cybersecurity