Amazon Web Services (AWS) unveiled Amazon Redshift Spectrum, a new feature that enables Amazon Redshift customers to operate SQL queries against exabytes of their data in Amazon Simple Storage Service (Amazon S3). With Redshift Spectrum, customers can extend the analytic power of Amazon Redshift beyond data stored on local disks in their data warehouse to query huge amounts of unstructured data in their Amazon S3 “data lake” – without a need to load or transform any data. Redshift Spectrum applies sophisticated query optimisation, scaling processing across thousands of nodes so results are fast – even with large data sets and complex queries.
Amazon Redshift is one of AWS’s developing services as it enables customers to perform complex queries on petabytes of structured data stored on high-performance local disks and get superfast performance – all for a tenth of the cost of traditional data warehouses. However, as the cost of data storage has continued to drop, customers are increasingly storing vast amounts of data in Amazon S3 “data lakes,” including unstructured data that may never make it into a data warehouse. Now, with Redshift Spectrum, analysing all of this data is as easy as running a standard Amazon Redshift SQL query.
Redshift Spectrum directly queries data in Amazon S3, with no loading or transformation needed, using the open data formats customers already use, including CSV, TSV, Parquet, Sequence, and RCFile. Since Redshift Spectrum supports the same SQL syntax of Amazon Redshift, customers can operate queries using the same Business Intelligence (BI) tools they do today. They can also run queries that span both the frequently accessed data stored locally in Amazon Redshift and their full data sets stored cost-effectively in Amazon S3. Redshift Spectrum automatically scales query compute capacity based on the data being retrieved, so queries against Amazon S3 data run fast, whether processing just a few terabytes, petabytes, or even exabytes.
According to Raju Gulabani, Vice President, Databases, Analytics, and AI, AWS, “Customers such as Amgen, Boingo Wireless, Electronic Arts, Hearst, Lyft, Nasdaq, Scholastic, TripAdvisor, and Yahoo! are migrating to Amazon Redshift in droves because it leverages the scale of AWS to analyse petabytes of data with ten times the performance at one-tenth the cost of old guard data warehouses. Many of these customers have asked us to extend the speed and flexibility of Amazon Redshift beyond the data warehouse to analyse all of the data they have in Amazon S3. Redshift Spectrum does just this, offering the best of both worlds by making it incredibly easy to query exabytes of data in Amazon S3 directly from Amazon Redshift. We’re excited to now make exabyte-scale analytics fast, simple and accessible to companies of all sizes.”
Vladimir Barkov, Director of Data Architecture and Engineering at Time Inc also said, “As a media company, we receive a large quantity of data from a number of ad serving providers. This data comes in a variety of formats and needs to be integrated with our own internal systems in order for our teams to be able to analyse it. Redshift Spectrum enables us to directly operate on our data in its native format in Amazon S3 with no pre-processing or transformation. Our data pipeline is much simpler now, and our execution time has been lowered significantly.”
Edmunds offers detailed, constantly updated information about vehicles to 20 million monthly visitors. “Amazon Redshift’s scalability allows us to support our ever-growing data volumes, unlike our previous, on-premises data warehouse solution,” said Ajit Zadgaonkar, Edmunds’s Executive Director of Operations and Infrastructure. “With Redshift Spectrum, we no longer need to think about what data to retain for analysis and what to throw away. We can now run real SQL queries directly on many years of data stored cost-effectively in Amazon S3. Redshift Spectrum’s fast performance across massive data sets is unprecedented.”
Yong Huang, Director of Big Data and Analytics at Redfin mentioned, “With millions of users and hundreds of millions of property listings, our website and internal systems generate a vast amount of data. Our data analytics platform has been built from the ground up on AWS, using Amazon S3 for storage, Amazon Kinesis for streaming, Amazon EMR for data processing and real-time applications, and Amazon Redshift for data warehousing. We love Redshift Spectrum because it allows us to directly and flexibly query our most up-to-date data coming from many different complex pipelines in many different file formats. Our data science team using Amazon EMR can now collaborate with our marketing and product teams using Redshift Spectrum to analyse the same Amazon S3 data sets.”
Customers can begin using Redshift Spectrum using the AWS Management Console. Amazon Redshift Spectrum is available in the US East (N. Virginia), US East (Ohio), and US West (Oregon) Regions and will expand to additional Regions in the coming months.