With Hurricane Harvey having just wreaked havoc in Texas and several more storms spinning up in the Atlantic, here's a look at what you can do to mitigate damage to your data center in all sorts of extreme weather conditions.
When Disaster Strikes
Natural disasters for 2017 continue the trend of destruction, costs and number of events when compared with previous years. Although some reports have indicated that more than 60 percent of data center failures are caused by human error, that leaves as much as 40 percent prone to natural disasters. Most recently, Hurricane Harvey has already been yet another historically deadly event with more than 50 lives lost, almost 30 trillion gallons of rain, and approximately ten percent of the buildings in Harris County, Texas have been flooded. In fact, costs could exceed $190 billion. So far, there have been nine weather and climate disaster events each exceeding $1 billion in the US which have included flooding, hail storms, tornado outbreaks, freezing, wild fires, and other severe weather events.
Avoiding catastrophes requires understanding risks and probabilities. Geologic data is available including modeling and simulation tools with which to assess the likelihood of flooding, for example, this page. Additionally, there are reports available which can aid in determining where to locate a new data center or with identifying risks associated with the existing location – the Cushman & Wakefield 2016 Data Center Risk Index is just one.
Data center senior management should have in place a disaster recovery plan that should include contingency operations, scenarios, identified resources including outsourced organizations, and should include required periodic training for all personnel. In addition, the mitigation of risks and continuity of service should be exercised to ensure that all personnel are knowledgeable for such emergencies and that any shortfalls have been mitigated – this should include leveraging standards and related policies:
- National Institute of Standards and Technology (NIST) 800-34, Contingency Planning Guide for Federal Information Systems
- SANS Institute Disaster Recovery Plan Strategies and Processes
- ISO/IEC 24762, Information technology -- Security techniques -- Guidelines for information and communications technology disaster recovery services
Emergency and incident response plans with appropriate escalation procedures should include redundancies:
- Electrical generators, fuel cells, UPS/batteries, solar, and fuel
- Server farms
- Heating, ventilation, and air conditioning (HVAC)
- Storage systems
- Network infrastructure
- Enterprise emergency notification system and customer communications
- Identify vendor emergency response resources including service level agreements (SLAs)
- Spares – equipment and tools with pre-coordination with supplies or outsourced assets
- Evacuation plans
- Call up trees
- Shelter-in place supplies (e.g., food, blankets, pillows, etc.)
Plans and procedures must be frequently reviewed and updated because emergency response is a continuum and not a single event only to be forgotten until the next crisis. Maintaining uptime is imperative for mission critical customer businesses – the average cost can range from $8,000 to over $600,000 per hour with downtime averages of over two days. The lessons learned from recent and historical events must be considered and changes made, if necessary, to all plans with contingencies that include traffic failovers and re-routing. No one person or organization can expect to plan for every emergency or natural disaster but scenarios must be given due diligence when making changes and improvements for that dreaded day.