When The Cloud Goes Down

Has everyone seen the ‘Slip Slidin’ Away’ Bridgestone Blizzak television commercial? It brilliantly depicts what it’s like to lose control of an automobile on a snowy or icy road.  Anyone who lives in a winter wonderland or visits one can easily relate to that terrifying feeing of having no control and helplessness of something you depend on.

Imagine having a similar feeling of helplessness when your Cloud provider has a major outage that affects your entire business.   Perhaps it’s your email system, Intranet, documents, CRM site, or another key application.  Worse, what if it’s your IAAS provider that houses development or customer facing systems?

Naturally, you quickly turn to your Cloud provider for help.  Suddenly, your love of instant messaging, web forms, status pages, and social media turns into shear panic as you realize you’ll never get to speak to a human being.  You are greeted with a message saying, “We’re sorry.  Our services are temporarily down.  Our technicians and engineers are hard at work to resolve the problem. Please check back or follow us on the myriad of social networking outlets we support.  We’ll be updating this site every 15 minutes.  We appreciate your business and your patience.”

Don’t worry; the next 15 minutes will pass faster than a quantum torpedo detonating against a Borg cube, as you’ll be spending it fielding calls from people of all levels in the organization.  After hitting the refresh button on the browser you’re greeted with another message saying, “We are in the process of restoring our services.  The approximate time to completion is 5 to 6 hours.  We apologize for the outage but remember we haven’t had one in 2 years.  We appreciate your business and your patience.”

Far fetched?   I thought so too until I had the pleasure of experiencing it first hand.  In the aftermath, no guaranteed SLAs or credits could make up for the headache I had.  As someone who ponders, evangelizes, analyzes, and designs the next generation datacenter (the Cloud), this was a first hand lesson in the importance of continuing to radically re-think how we design, manage, monitor, predict, and recover the Cloud.  In other words, it’s time to stop putting lipstick on the technology and ideas of yesterday and make room for something different and innovative.

Finally, I’ve never really liked the term Cloud.  It implies simplicity or ease of use that may be prevalent on the front-end (users) but masks the reality of the complexity on the back-end (administrators).  The reality is nothing is 100 percent and “even the best laid plans go awry.”  The key is to understand that while technology is awesome, it pales into comparison to the power of being human.