Last week Sydney saw some pretty big storms – and not only in the physical sense. AWS datacentres were affected badly, and that caused some disturbance, to say the least, for many big name companies hosting on their cloud.
AWS promptly responded with a public report detailing the issue, along with an apology to their customers.
“This failure resulted in a total loss of utility power to multiple AWS facilities. In one of the facilities, our power redundancy didn't work as designed, and we lost power to a significant number of instances in that Availability Zone.”
“During this weekend’s event, the instances that lost power lost access to both their primary and secondary power as several of our power delivery line-ups failed to transfer load to their generators.”
Now being one of the availability zones closest to the SEA region, (the closest availability zone being Singapore) naturally we wanted to know how it might affect us – and our region is definitely not known for its kind weather conditions.
Speaking to AWS Senior Public Relations Manager, Regina Tan, she explained “Customers that were running their applications across multiple Availability Zones in the AWS Region were able to maintain availability throughout the event.”
As Regina explained, ““Availability Zones” refer to clusters of datacentres in separate distinct locations within a single AWS Region that are engineered to be operationally independent of other Availability Zones, with independent power, cooling, physical security, and are connected via a low latency network. AWS customers that need the highest availability for their applications can architect their applications to run in multiple Availability Zones to achieve even higher fault-tolerance. Additionally, our customers could also use another AWS Region as a fail-over site for highly mission critical apps/services, as part of their disaster recovery planning.”
The DSA team tried to follow up with the Australian PR team on a few questions, looking into 3 aspects:
“We aren’t commenting on the specific issue beyond the public message we posted here this week. I trust that message will provide the information you need regarding the event.”
We were only directed back to the public report, with the PR team highlighting the following quote:
“For this event, customers that were running their applications across multiple Availability Zones in the Region were able to maintain availability throughout the event. For customers that need the highest availability for their applications, we continue to recommend running applications with this architecture.”
In all fairness though, the AWS team was detail about what happened in the public report. We’re sure the customers appreciated their apologies and dedication for an absolutely perfect service.
“We apologize for any inconvenience this event caused. We know how critical our services are to our customers’ businesses. We are never satisfied with operational performance that is anything less than perfect, and we will do everything we can to learn from this event and use it to drive improvement across our services.”
While we do agree that AWS has awesome technology, and a pretty good track record with their customers and support; in our view, they certainly do not have a very good track record at keeping journalists happy.
We really want to love them, and the work they do - whether technically or their other services for customers - definitely has the potential for them to be recognised in better light. As evidence in their response after the Sydney power failure – they were quick to act, and gives full support and accountability towards their services. We like that they were thorough, not only in explanation, but also ensuring customers’ data integrity.
That said we feel we asked valid questions that would have allowed AWS to position their DR capability in light of other options available. We suspect this actually would have come out favourably, but for now we are kind of guessing.
AWS, we wait for you to be less cold to us. We know you’re good, and you’ve certainly earned your place and bragging rights; maybe get off your pedestal once in a while.
We shall wait for the day they allow us to report on them in a better light.
Read the full AWS report here.