Amazon/ Cloud Customers: “Trust, but verify”

While Amazon continues to recover from their Cloud outage, it seems that some in the industry are throwing some FUD (fear, uncertainty, doubt) their way.  After all, Amazon has been on an amazing run within the Cloud business as they continue to build-out new datacenter regions while continuing to reduce prices to their customers.

Based on the many articles written on Amazon I have the following observations.

  • Confusion rains supreme within the world of Cloud computing
  • It’s time to stop pontificating and start solving problems
  • Cloud computing isn’t simple
  • Putting Cloud in front of every product’s name isn’t helpful
  • Availability zones are misunderstood
  • Cloud does not detract from personal responsibility
  • We need more information / facts
  • Those who think this outage proves Cloud computing isn’t ready for prime time are missing the boat

While Amazon has the ultimate responsibility for this outage, why didn’t their customers have a contingency play for this scenario?  Did everyone simply think Amazon could never go down?  Haven’t we learnt anything from past outages of Google, Microsoft, and others?

Earlier this year I wrote a blog post entitled “When the Cloud Goes Down” detailing an experience I had with a provider.  The black box mentality of the Cloud needs to be replaced with an openness and transparency that does not exist today.  A dashboard showing status and health of the Cloud is simply not enough.  We need the ability to monitor and manage our slice of the Cloud independently of the Cloud provider.

Ultimately, we may have seen the perfect argument for the Hybrid cloud; defined as the ability to provide some resources on a private cloud while accessing additional resources on a public cloud.  In a Hybrid cloud model, customers would have the ability to swing services from the public cloud to their private cloud or to other public cloud providers to avoid outages.  Where is the Amazon VM Export capability?

In the end, I’ll borrow a famous phrase from President Ronald Reagan, “Trust, but verify.”  Your business may depend on it!


When The Cloud Goes Down

Has everyone seen the ‘Slip Slidin’ Away’ Bridgestone Blizzak television commercial? It brilliantly depicts what it’s like to lose control of an automobile on a snowy or icy road.  Anyone who lives in a winter wonderland or visits one can easily relate to that terrifying feeing of having no control and helplessness of something you depend on.

Imagine having a similar feeling of helplessness when your Cloud provider has a major outage that affects your entire business.   Perhaps it’s your email system, Intranet, documents, CRM site, or another key application.  Worse, what if it’s your IAAS provider that houses development or customer facing systems?

Naturally, you quickly turn to your Cloud provider for help.  Suddenly, your love of instant messaging, web forms, status pages, and social media turns into shear panic as you realize you’ll never get to speak to a human being.  You are greeted with a message saying, “We’re sorry.  Our services are temporarily down.  Our technicians and engineers are hard at work to resolve the problem. Please check back or follow us on the myriad of social networking outlets we support.  We’ll be updating this site every 15 minutes.  We appreciate your business and your patience.”

Don’t worry; the next 15 minutes will pass faster than a quantum torpedo detonating against a Borg cube, as you’ll be spending it fielding calls from people of all levels in the organization.  After hitting the refresh button on the browser you’re greeted with another message saying, “We are in the process of restoring our services.  The approximate time to completion is 5 to 6 hours.  We apologize for the outage but remember we haven’t had one in 2 years.  We appreciate your business and your patience.”

Far fetched?   I thought so too until I had the pleasure of experiencing it first hand.  In the aftermath, no guaranteed SLAs or credits could make up for the headache I had.  As someone who ponders, evangelizes, analyzes, and designs the next generation datacenter (the Cloud), this was a first hand lesson in the importance of continuing to radically re-think how we design, manage, monitor, predict, and recover the Cloud.  In other words, it’s time to stop putting lipstick on the technology and ideas of yesterday and make room for something different and innovative.

Finally, I’ve never really liked the term Cloud.  It implies simplicity or ease of use that may be prevalent on the front-end (users) but masks the reality of the complexity on the back-end (administrators).  The reality is nothing is 100 percent and “even the best laid plans go awry.”  The key is to understand that while technology is awesome, it pales into comparison to the power of being human.