Two weeks ago, Microsoft suffered a significant outage that struck platforms, programs, databases, and even email accounts for more than 24 hours.
Microsoft chalked the problem up to failed automation and infrastructure stability at a data center in Australia, according to a post-incident analysis report. A power blink on August 30 caused several cooling units to go offline, which increased temperatures and then triggered a shutdown to try and preserve critical hardware.
There was a human element to the outage, however. Microsoft revealed that staff could have turned the cooling units back on manually—but staffing was too low when the issue arose for action to be quickly taken.
That was cold comfort to millions of users who had to go without mission-critical services through Outlook, Teams, and Azure. Luckily, the outage was intermittent, affecting some people but not others. Many users reported problems with applications like Word and Excel, while others said Outlook issues were quickly resolved.
Cybersecurity experts pointed out that Microsoft’s recent outage follows several other high-profile incidents. In July, OneDrive and SharePoint were affected; in June, Outlook for Web users were prevented from accessing email accounts for eight hours. In April, all of Microsoft 365’s apps were knocked out for a short time, which followed a global outage in February.
What can your business do to prepare for an IT disaster?
Preparedness is key, no matter what industry your business operates in or where it’s located. Many states hold disaster prep exercises for earthquakes, fires, floods, and other natural catastrophes.
But even when we’re equipped for the worst-case scenario, sometimes milder problems like email outages or application lag time can wreak the most havoc. The lessons learned during “big” disaster prep exercises can also be extended to “smaller” incidents, when computers, servers, networks, smartphones, hard drives, and other IT devices don’t work optimally.
Here are a few strategies that CMIT Solutions recommends to reduce the risk of an IT disaster and mitigate the negative impacts if one occurs:
- First and foremost, don’t overlook the importance of disaster preparedness. Many business owners assume a natural disaster will never affect them. Annual surveys by the National Small Business Administration show that nearly 2/3of small business owners don’t have a disaster recovery plan—or access to a backup generator. However, the NSBA estimates that 65% of U.S. businesses are situated in geographic areas that regularly suffer from natural disasters.
- Implement off-site, redundant, and encrypted data backups. The vast majority of business backups are done on-site—often on drives located directly next to the computers they’re backing up. If an outage affects your business, you can’t expect those backups to be spared. It’s also crucial to store backups in multiple locations—that’s the “redundant” part. IT experts estimate that the data centers housing the bulk of U.S. business information are primarily located in populous states like California, Texas, New York, Florida, and Washington—coincidentally, the same five states that lead the pack in FEMA disaster declarations.
- Plan for the aftereffects of any outage, not just specific threats that might lead to one. Instead of getting bogged down in the details of a particular disruption, comprehensive disaster recovery planning addresses the steps necessary to get your business up and running, no matter the event. Many business owners think that even if an outage strikes, they’ll only be affected for a few hours. Companies that can maintain communications with employees and clients during outages are typically the ones that see their reputations strengthened—while companies scrambling to figure out how to respond are the ones affected for longer.
- Create (and test) a virtualization strategy in advance. The best preparedness plans include virtualization, which takes the data you have backed up remotely and rebuilds it on existing or secondary equipment in case of disaster. But if you haven’t tested your solution to see how quickly it can retrieve information and get you back up and running, you could suffer. Best-in-class offerings from trusted IT providers should be able to perform a full restore in less than 48 hours—and, if needed, a quick restore in less than 24 hours.
- Understand what role your employees play. This is all about responsibility and communication: how will everyone be notified in the wake of an outage? Who will handle specific parts of a recovery plan? Who plays a primary role as opposed to a supporting one? What lines of communication will be used? Discussing these in advance—and having them documented in an easy-to-reach repository—is crucial to success in the wake of an outage.
Hopefully, this information convinces you of the importance of preparedness—but you may still need help implementing such a plan. That’s where CMIT Solutions comes in: We specialize in helping businesses across North America prepare for and weather both the roughest storms and the most mundane IT issues.
Over the last few years, we’ve helped companies of every size survive hurricanes, floods, wildfires, tornadoes, earthquakes, and the day-to-day perils of human error and hardware failure.
As fellow business owners ourselves, we pride ourselves on helping other organizations overcome such obstacles. Contact CMIT Solutions today to better prepare for the inevitable—and better position your company for short-term and long-term success.