Data center outages can last anywhere from seconds to hours. But regardless of the length of time a system is down, unexpected downtime can cost you revenue, customers and brand loyalty. This is why it’s critical to ensure continuous monitoring against common culprits for data center downtime. Even if you have a DCIM, it is ideal to have a redundant system that can provide immediate notification of alarms from your UPS, PDU, and other critical equipment.
Here are some of the top causes of data center downtime and strategies for how you can prevent them from happening in your data center.
UPS System Failure
Power failures are going to happen for one reason or another. Weather is a prime culprit. Most data centers rely on uninterruptible power supplies (UPS) to maintain power to the servers until the automatic transfer switch (ATS) calls for generator power. But what happens when your UPS system fails too? Even if it’s only a few seconds until the generator takes over, the damage has been done. That’s why it’s critical to have a UPS backup system and routinely test it for optimum performance.
Cybercrime is the fastest growing and continuously evolving cause of data center downtime today. Ensuring you are up to date on best practices is key to preventing cybercriminals from infiltrating the secure information housed within your data center. All employees should undergo background checks and sign confidentiality and security agreements before they begin work. Passwords should be complex and must be changed every 60-90 days. Servers containing sensitive data should be restricted to be used only by employees who are required to access that information to do their job.
Power and Equipment Failure
Power and equipment failures can happen for a variety of reasons, from weather to human error. Because it can be so unpredictable, it’s essential to continuously monitor the power distribution units (PDUs) that serve up your electronics power. Each server rack typically has an outlet power strip powered by the PDU. The PDU should also monitor each strip’s load. When a power issue is detected, the PDU’s built-in alarm panel will trigger an output relay in any of the strips or main PDU. Your DCIM will alarm if there is an issue, but a remote monitoring system can be used as a redundant alarm notification system.
Often, physical dangers are not that noticeable, yet can cause severe damage to data center equipment and the information housed within. Water, excessive temperature, high humidity and insufficient airflow are just a few of the threats that can occur at any time, often without warning. Remote monitoring systems can be your eyes and ears, even when your staff can’t be physically present. These systems enable facility managers to remotely monitor the server’s status and be notified immediately if any conditions move out of their predetermined range.
How Remote Monitoring Systems Can Help Prevent Downtime
Remote environmental monitoring systems give facility managers the ability to know exactly what is going on in their server rooms at all times. Systems like the Sensaphone Stratus EMS provide real-time status of all monitored equipment and environmental conditions from a smartphone, tablet or computer. For server rooms that don’t have a DCIM, these systems are a cost-effective way to monitor the whole data center environment, including devices like UPS and PDU, sending alarm notifications when a condition falls out of range.
The Stratus EMS can monitor virtually any condition including temperature from -109°F to 168°F (-85°C to 76°C), humidity, pressure, power or equipment failure, vibration and water leakage. Because it supports Modbus RTU/485 and Modbus TCP, the Stratus EMS can read data from building automation and uninterruptible power supply systems.
Operators can change settings, disable alarms and readjust temperature limits remotely through an easy-to-use mobile app or website. All data is stored in the cloud, so it can be easily retrieved when needed and never has to be overwritten because of space constraints. When a sensor detects a reading that is out of range, the Stratus EMS immediately notifies designated personnel via text message, email or phone call. The Stratus EMS constantly communicates a signal to the Sensaphone cloud to validate its online status. If communication is interrupted, the system will immediately send an alert to users indicating that the internet connection is lost.
Downtime is costly, but being proactive by implementing best practices doesn’t have to be. By incorporating a remote monitoring system like the Stratus EMS, you can rest easy. If a threat occurs, you have the proper fail-safes in place and will be alerted immediately. To learn more about how the Stratus EMS can help you avoid costly downtime in your data center, contact a Sensaphone expert today.