Post-Outage Recovery and Lessons from the CrowdStrike Incident

Following the CrowdStrike outage on July 19, 2024, companies globally have been working to restore business continuity and enhance their resilience for future incidents. The outage, caused by a faulty content update, led to crashes on approximately 8.5 million Windows devices, affecting hospitals, airlines, and other businesses.

Thank you for reading this post, don't forget to subscribe!

Although less than 1% of all Windows machines were impacted, the incident caused significant disruptions, including appointment cancellations at hospitals. For instance, Mass General Brigham canceled all non-urgent visits on the day the outage began. Other healthcare organizations, such as Memorial Sloan Kettering Cancer Center, Cleveland Clinic, and Mount Sinai, also faced operational challenges.

The cause of the outage was a defective content configuration update to CrowdStrike’s Falcon threat detection platform, not a cyberattack. A bug in the content validator allowed the faulty update to bypass validation, as noted in CrowdStrike’s preliminary post-incident review.

David Finn, Executive Vice President of Governance, Risk, and Compliance at First Health Advisory, shared with TechTarget Editorial, “The recovery is well underway, and most healthcare organizations are back up and running. While the scope was smaller compared to other recent incidents in healthcare, the response was effective. There are valuable lessons to be learned.”

Preparing for Future Incidents

Finn, with 40 years of experience in health IT security, emphasized that incidents are inevitable. “The challenge is to plan, prepare, and be able to recover and stay resilient,” he stated. Whether facing a major cyberattack like the February 2024 Change Healthcare incident or an IT outage without malicious intent, healthcare organizations must be ready for various cyber incidents affecting critical systems.

He highlighted the importance of thorough due diligence and incident response planning. Addressing potential operational challenges in advance and planning for cybersecurity events or IT failures will prove beneficial when an incident occurs. “We need to rethink how we deploy software,” Finn added. “Human errors will always happen, and it’s our job to protect against those mistakes.”

Building Cyber-Resilience

Cyber-resilience is crucial for quickly recovering and resuming operations. Organizations should anticipate incidents and focus on building resilience. Finn noted, “While I still trust CrowdStrike, trust does not guarantee perfection. Resilience and redundancy are vital.”

Healthcare organizations responded swiftly to the CrowdStrike incident, with Mass General Brigham activating its incident command to manage the situation. The organization ensured that clinics and emergency departments remained open for urgent health concerns and resumed scheduled appointments and procedures by July 22.

Evaluating Risk and Updating Protocols

Erik Weinick, co-head of the privacy and cybersecurity practice at Otterbourg, urged organizations to use the CrowdStrike incident as an opportunity to reevaluate their risk management protocols. “Even if the incident was accidental, organizations should conduct information audits, penetration testing, update system mappings, and reinforce security practices like multifactor authentication and strong password policies.”

Addressing Third-Party Risk

The outage underscored the importance of managing third-party risks. The interconnectedness of healthcare systems amplifies these risks, as evidenced by some of the largest healthcare data breaches in recent years originating from third-party vendors.

Finn suggested that while organizations may conduct risk analyses on vendors like CrowdStrike, they should also inquire about the tools used in software development. “We need standards and certifications for software used in critical infrastructure sectors,” he said.

In response to the incident, CrowdStrike committed to enhancing its software resilience by adding more validation checks and conducting independent third-party security code reviews.

Weinick advised reviewing vendor agreements, updating business disruption insurance coverage, and conducting tabletop exercises to rehearse business continuity and recovery procedures for all potential disruptions.

Overall, the CrowdStrike outage highlighted critical IT and security considerations, emphasizing the need for resilience, effective third-party risk management, and robust incident response and recovery plans.

Related Posts
Salesforce OEM AppExchange
Salesforce OEM AppExchange

Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more

The Salesforce Story
The Salesforce Story

In Marc Benioff's own words How did salesforce.com grow from a start up in a rented apartment into the world's Read more

Salesforce Jigsaw
Salesforce Jigsaw

Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more

Health Cloud Brings Healthcare Transformation
Health Cloud Brings Healthcare Transformation

Following swiftly after last week's successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more