Spending Less Time Putting Out Fires, More Time On Solutions

Posted by Instor on Oct 12, 2017 3:22:00 AM

There is a phrase which has been used by many in the Amish community and one which is quoted by the 17th Century English writer Lewis Carroll, “The faster I go, the behinder I get.” This is so true in an age where technology rapidly changes while data center staff is chasing proverbial fires.


The Four Stages of Maturity in the Data Center Lifecycle

Aside from the challenges of a shortage of qualified staffing, training, funding, and limited resources, the data center manager must strive to balance high levels of operational availability, reliability, budgets, service level agreements (SLAs), and forge the implementation of new technologies. Throughout the data center lifecycle, there are typically four stages of maturity based on availability, efficiency and flexibility: basic; consolidated; available; and strategic - employing intelligent tools and techniques will pay dividends to daily operations and the bottom line.

Tools and automation technology can ease the workload and benefit as a force multiplier regarding manpower and capabilities - mundane scripts, software patching, configuration deployment, network performance monitoring and more. There are four main areas to focus deployment of automation and tools:

  • Server management (move virtual machines (VMs) based on SLAs)
  • Storage management (tiered storage)
  • Network management (provisioning and performance services, rapid recovery times)
  • Facilities management (energy consumption and availability, air flow, cooling) - align strategic data center plans continuously with business goals.

As technology evolves there are confluences from differing technologies including the Internet of Things (IoT) and Artificial Intelligence (AI), for example. Tools to consider regarding automation include:

  • Predictive Analytics / Predictive Maintenance (PdM)
  • Condition-based Maintenance (CBM)
  • Building information modeling (BIM)
  • Building Automation System (BAS)
  • Root Cause Analysis tools
  • Modeling and simulation (M&S) tools


Predictive Analytics / Predictive Maintenance (PdM) is a means to predict when maintenance should be performed for unexpected future event using machine learning, analytics, probabilities and statistics and AI.  One particular tangent to PdM is the Observe, Orient, Decide, and Act or OODA Loop. This tool is based on a concept developed by military strategist John Boyd while he was a fighter pilot during the Korean War. Its focus is on rate of change and it resembles a typical closed loop system when timely decision making is critical.

CBM optimizes cost savings between preventative and corrective maintenance while improving system and subsystem reliability using diagnostics and prognostics. It all starts with hardware, sensors, software algorithms, troubleshooting aides and data acquisition. A data statistical analysis model is created with input from the environment, monitor for out-of-norm specification and configuration deviations, risk assessment, and comparison of outage data is used to estimate when to perform maintenance.

BIM models typically include 3D building data, construction components, parts, visualization of piping, cable layout, electrical power distribution, lighting systems, HVAC (heating, ventilation, air conditioning), integrated raised flooring, spatial geometries to scale including computational fluid dynamic (CFD) simulations for air flow analysis. Some operations chose to incorporate Data Center Infrastructure Management (DCIM) with BIM to provide a comprehensive view of IT and facility operations.

BAS has evolved from HVAC systems to include lighting, fire, safety, and physical security systems. Value may be found in energy savings, maintenance scheduling, extended asset life and cost savings with a focus on predictability and protection from unexpected failures. Similarly, Industrial Internet of Things (IIoT) systems use sensors, actuators, motors, Programmable Logic Controllers (PLCs), Remote Terminal Units (RTUs) and gateways.

Root Cause Analysis tools may be used to define a problem, collect data, identify possible casual factors, identify root causes, and recommend solutions. There are many software apps from which to chose and some include the “5 Whys,” drill down, and cause and effect diagrams.

Modeling and simulation (M&S) tools may include a wide variety including OPNET, EXata, MATLAB, R Studio, TensorFlow, scikit-learn, Torch, ETAP, and more. A model is created that can be comprised of multiple components and the behavior is defined so that it realistically represents the ‘real world’ application.

The data center is similar to an onion – peel it back and you’ll find layers of complexity. Data center staff should investigate other industries’ tools and technology:

  • Industrial Control Systems (ICS)
  • Supervisory Control and Data Acquisition (SCADA) systems
  • Distributed Control Systems (DCS)
  • Industry 4.0 (converged IT and OT)
  • IIoT and Building IoT (BIoT)

Another source of guidance regarding similar technologies can be found at the U.S. Department of Commerce National Institute of Standards and Technology (NIST). 



Topics: Efficiency