Authored by Hu Yoshida, Chief Technology Officer, Hitachi Vantara
Over the past year there has been a surge in reports of data breaches from Amazon S3 buckets that were left accessible online, exposing private information from all sorts of companies and their customers. The list of high profile victims includes, Accenture, Booz Allen Hamilton, Dow Jones, Time Warner, Tesla, TSA, Verizon, and the U.S. National Geospatial Intelligence Agency, and resulted in the exposing of hundreds of millions of private records to public access.
How did this come about? When you create an Amazon S3 (Simple Storage Service) cloud storage bucket, you can store and retrieve data from anywhere on the web. In most of the breaches, the companies left Amazon S3 buckets configured to allow public access. This means that anyone on the internet with the name of the S3 bucket could access and download its content.
According to statistics by security firm Skyhigh Networks, 7% of all S3 buckets have unrestricted public access, and 35% are unencrypted, meaning this is a major problem for the entire Amazon S3 ecosystem. To compound the problem there are many tools available on the web to comb through leaky AWS datasets, so finding these exposed S3 buckets is relatively easy. BuckHacker is a search engine tool that provides the most accessible way to search for exposed S3 buckets.
While encryption is one way to limit the consequences of an S3 data breach many people might think they are protected by encryption when they actually aren’t. People believe that if the data is encrypted, then the data content cannot be rapidly compromised. It depends on where the encryption is done. The most easily-adopted approach to encryption, Server Side Encryption (SSE) doesn’t solve the open bucket problem. With server-side encryption (SSE) Amazon S3 will encrypt your data before saving it to disks in its datacenter, but the data is automatically decrypted when the data is downloaded – so data in an encrypted open bucket is accessible in clear text despite being stored in an encrypted manner.
Protection is provided by client-side encryption, where a client encrypts the data and then uploads the encrypted data to Amazon S3. In this case the client manages the encryption process, the encryption keys, and related tools, and maintains ownership of the encryption keys. And even if the bucket is publicly readable, the content is still encrypted when it is accessed.
Another solution for S3 data leaks is to take an on-premises or hybrid approach. Systems that run as a public cloud have a much broader attack surface with a one-size-fits-all security approach. Running the system on-premises minimizes the risk of accidental public exposure of confidential data. In this way you can take advantage of the scalability and low cost of Amazon S3 storage while enjoying the protection provided by your enterprise firewall, and access controls, along with client data protection services like data at rest and data in flight encryption.
An on-premises or hybrid approach with a client to S3 can provide another layer of protection if hackers should be able to penetrate AWS and gain access to the underlying computing infrastructure, which may enable direct access to the physical media that Amazon uses for S3 or any backup or replicas. This assumes that the client had encrypted the data. Also, since Amazon is a public cloud service, there is always the possibility of a National Security Letter or other legal hold being placed on the data, which would require users to retain their files in another location that could also be attacked.