Automation of data centers must be carefully planned and implemented while achieving business goals – not letting technology drive the need but focusing on capability solutions to complete tasks and deliver results. There is no doubt that automation can solve the pain points regarding performing mundane tasks, avoid human errors, improve inefficiencies, reduced time, and cost savings.
Weigh these Costs vs. Benefits for Data Center Automation
The average cost of a data center outage was $740,357 in 2015 with a maximum downtime cost of $2,409,991 according to a Ponemon Institute study. The U.S. Government is also struggling to budget for increased costs associated with operations and maintenance (O&M) investments at costs exceeding $80 billion - supporting outdated technology and a national IT infrastructure that utilizes Windows 3.1, 8-inch floppy disks, and source code that is more than 50 years old. Instead of having a reactive IT staff, a proactive IT staff can make better use of time and resources by focusing on development and making strides towards customer service improvements all while meeting the organizational business mission.
There is no silver bullet automation solution which can cure all pain points but it can reduce waste, costs, and inefficiencies in data center operations. A balance must be made between having no automated mechanisms and having the ideal “lights out” data center, from simple virtual server creation to DevOps. Maintaining legacy apps and systems is crucial as well as implementing new services and workload scaling.
Data center automation may connote many different definitions depending on perspectives. Automation should be defined by the repetitive task or aspects of workloads which must be performed on a schedule and without error, typically. Before deciding to automate, all risk should be identified, mitigated, and, if possible, tested before “going live.” A plan B should be identified and used in the event that the automated tasks do not work or if there are means to “roll back” from the processes manually. Here are some considerations in regards to deciding on data center automation:
- Timeliness of pending issues or failures: staff can be made aware and quickly escalate, if needed, the issue at hand with logging capturing the contributing factors leading to the event for auditing and compliance.
- Costs: there are inherent time-savings associated with redundant, mundane tasks to be performed which allows the staff to focus on more important functions.
- Repeatability: with a mature automated process and refinement, tasks can be improved and matured over time and result in greater confidence for the solution.
- Force multiplier: as more tasks become automated and while staff shortages exist, redeployment of existing staff can serve to provide more coverage while tackling bigger inefficiencies.
- Reduced human errors: this should be obvious because many data center outages are attributed to humans.
- Recognition of unauthorized or unexpected changes: staff can be alerted and take action when an incident occurs.
- Real-time trigger action: instead of just alerting the staff, the automated solution can take action in response to thresholds with minimal human-intervention.
- Alarm fatigue: everything can happen that will happen including false alarms or nuisance alarms which soon get ignored.
- Propagating errors: when code fails, errors can propagate throughout the system and potentially cause cascading affects.
- Prioritization: when there are multiple issues occurring simultaneously, the staff must be trained to manually intervene and “override” some automated functions depending upon circumstances.
- Job security: there will always be some staff members who are concerned that their job will be relegated to a machine or process regardless.
- Confidence: it may take some time but acceptance of the automated solution can take time before everyone has completely bought into the idea.
A driver for automation can be found in what is considered wasteful – for example, the operational expense (OPEX) waste found in the time that is spent to provision resources and capital expense (CAPEX) waste due to idle resources and over provisioning. A Hewlett Packard Enterprise sponsored study of the International Data Corporation (IDC) - Quantifying Data Center Inefficiency: Making the Case for Composable Infrastructure – surveyed over 300 IT users of medium-sized and large enterprises regarding data center efficiency found that the time spend on provisioning tasks (e.g., provisioning virtual server instances, provisioning physical server instances, decommissioning physical servers, etc.) ranged from 5.9 to 11 hours for each task. Additionally, the time spent on maintenance tasks (e.g., new services requests, approval management, software installation, configuration management, etc.) ranged from 12.1 to 17.4 hours for each task. The top three priorities regarding infrastructure optimization were identified:
- Maintenance activities (e.g., firmware updates, patching, etc.)
- Troubleshooting (e.g., monitoring, remediation of incidents, etc.)
- Provisioning (e.g., storage, servers, network, etc.).