By any standards, 15 consecutive hours of unplanned downtime in a modern-day production system is not acceptable. When those 15 hours are with one of the world’s largest SaaS providers, the impact and reach are enormous. In the case of Salesforce.com, with coming on for 200,000 customers and more than 4 million users, the impact of downtime can affect whole business ecosystems.
When these downtime incidents happen, they highlight how one of the powers of cloud can also be one of its greatest risk factors. For public cloud providers, they operate at massive scale, so when something goes wrong, it tends to happen at scale also. A small mistake can replicate instantly across massive infrastructure, impacting millions of users.
So what happened on Friday last week? The simplified version goes something like this. Salesforce developers implemented a new database script into their live production environment. The script had the unintended consequence of affecting read/write access permissions for all company employees. This gave people unauthorised access to all company files, and according to Salesforce, it impacted one of their SaaS offerings, affecting customers of Salesforce Pardot. To be clear, read, write and access permissions mean that data could be stolen or even maliciously amended or tampered with. The issue was one of massive security compromise. Risk such as that is unacceptable, so “switching off” the service was the only sensible option.
A statement directly from Salesforce explained the situation as follows “To protect our customers, we have blocked access to all instances that contain affected customers until we can complete the removal of the inadvertent permissions in the affected customer orgs. As a result, customers who were not impacted may experience service disruption. In parallel, we are working to restore the original permissions as quickly as possible.”
The event is a stark reminder of the importance of “shared responsibility” for data protection and data security when it comes to any cloud or “as a service” offering. Cloud services are far from immune to mistakes, outages and vulnerabilities, so supplementing their services with your own lines of defence is a sensible thing to do for your business-critical data.
Druva, which delivers cloud-native data protection and management across data centres, cloud workloads and endpoints, has a solution for Salesforce users. According to information shared by Druva’s Chief Technologist, W. Curtis Preston, Salesforce recommends users to “use a partner backup solution that can be found on the AppExchange.” Preston cited several reasons why organisations may not want Salesforce to restore their account – Salesforce.com charges $10,000 to give you an export of your account; Recovery is not quick, as it is described as a manual process that will take Salesforce 6 to 8 weeks to complete, and when they do, it is provided in .CSV files with no metadata. It is up to you to upload the data using the Data Loader. It will be a very involved process that may require the use of professional services, costing you even more.”
Typically, the massive public cloud and “as a service” operators don’t experience such widespread global outages, but unscheduled downtime by cloud providers on a smaller scale is not unsustainable at all. When the big ones like this Salesforce incident occur, it helps to remind that relying on your provider alone to protect and secure your applications and data is not enough for most businesses.