Richard Jones, Cloudera’s Vice President for Asia Pacific and Japan
Hadoop, essentially a distributed data infrastructure, distributes massive data collections across multiple nodes within a cluster of commodity servers, which means you don't need to buy and maintain expensive custom hardware.
It also indexes and keeps track of that data, enabling big-data processing and analytics far more effectively than was possible previously.
Although it’s been around for a few years, the ecosystem surrounding it has grown and many large companies are starting to ask what it can do for their business.
Cloudera, one of the leaders in next generation data management, claims to be the first company to commercialise Apache Hadoop and to develop enterprise-grade solutions built on this powerful open source technology.
Today, Cloudera is the leading innovator in and largest contributor to the Hadoop open source software community. Cloudera has the largest share and is widely recognized as the standard in enterprise Hadoop.
In this week’s interview series, Richard Jones, Cloudera’s Vice President for Asia Pacific and Japan shares his take on Hadoop’s challenges and opportunities particularly in the ASEAN market.
DataStorageAsean: In simple terms what’s the difference between Hadoop and a Normal Database?
Richard Jones: Apache Hadoop is a popular open source storage and analytics platform. Compared to traditional databases, Hadoop is capable of handling more data and more kinds of data, and supporting more flexible types of analytics beyond just SQL. Because it is open source and can run on commodity hardware – or even in the cloud – it is also very economical.
Hadoop complements existing systems in two ways. First, it can offload expensive storage and processing of large data volumes to a more affordable platform, freeing up existing systems to perform high value analytics. Second, it enables new use cases, by integrating new types of unstructured data and expanding the types of analysis that are possible. For example, supporting full text search, machine learning, stream processing, and real-time data applications, all on the same platform.
However, Hadoop is not a transaction processing platform and traditional databases are best for that workload.
DataStorageAsean: Can you tell us about the newest developments in Hadoop and where you think the technology is heading?
Richard Jones: Originally, Hadoop was only useful for cheap storage and batch processing. It was not secure. Today, Hadoop has evolved and matured into a modern enterprise-grade data platform. Three years ago, Hadoop became interactive for data discovery through analytic SQL engines like Impala and Solr for text search. Two years ago, Cloudera added Spark to the Hadoop ecosystem for next-generation batch and stream data processing. Cloudera also offers comprehensive software to manage, secure, and govern Hadoop.
More importantly, organisations are now looking beyond the technology to solve real business problems. By aligning data to business objectives, they can derive even greater value. Popular use cases include: Developing a 360 view of your customer to build new revenue streams; driving efficiency in product and service delivery; and managing risk, compliance, and cybersecurity.
DataStorageAsean: Is Hadoop for all companies of any size and can anyone in an organisation get access to data held in a Hadoop cluster?
Richard Jones: The flexible, scalable and cost-efficient data management capabilities of Hadoop makes it ideal for any organisation with large volumes of data. Hadoop allows organisations to maintain enterprise-grade data security while permitting only authorised users to view, use and contribute to a particular data set.
Securing enterprise data requires a comprehensive approach, including authentication, fine-grained authorisation, data encryption, and auditing.
Cloudera Navigator, for example, maintains a full audit history of all data and activity in Hadoop. It tracks every access attempt, right down to the user ID, IP address, and full query text. This provides visibility into who has been accessing what data, ability to see point-in-time permissions and how they have changed, and review and verify HDFS permissions.
Incidentally, only Cloudera Enterprise is certified compliant with the Payment Card Industry (PCI) Data Security Standards. The platform is used by MasterCard, which requires that any technology handling its applications or payment card data files be PCI certified.
DataStorageAsean: What are the specific challenges for Hadoop adoption in the ASEAN region?
Richard Jones: While the need to leverage big data for insights is growing quick and universally, the general awareness of the benefits of extracting value from data is still a challenge for the ASEAN region. With this, there is also a lack of skills availability in the region. Increasingly, organisations across all verticals are using Hadoop to extract value from big data and there is clearly a demand for this particular skillset, and therefore, a gap that needs to be filled.
This is something we are working towards at Cloudera. We have recently announced that over 100 leading tertiary educational institutions, from more than 17 countries, have joined the Cloudera Academic Partnership (CAP) programme – a programme that gives computer science students access to free advanced Apache Hadoop curriculum, training and tools to prepare them for careers in big data.
Out of these 100 leading universities, 20% are based in Asia, and we are continuing to explore opportunities to grow this initiative in this region and beyond.
With CAP, the end result is a new generation of Hadoop-aware IT professionals who will fill the skills gap for employers in the region.
DataStorageAsean: What’s unique about your Hadoop offering?
Richard Jones: Powered by the world’s most popular Hadoop distribution, Cloudera Enterprise makes Hadoop fast, easy, and secure, so our customers can focus on results instead of the technology. Cloudera offers the highest performance and overall lowest TCO platform for using data to drive better business outcomes.
First, Cloudera offers the fastest analytic SQL with Impala and the industry’s best support for the high-performance Spark framework. The new Kudu storage engine is the first to enable fast analytics on real-time data. And Cloudera Navigator Optimizer now helps customers put data and workloads in the right system and format for the best performance and economics.
Second, Cloudera makes Hadoop easy to manage on-premise and in the public cloud through Cloudera Manager and Cloudera Director. Our customer support leads the industry in our breadth and depth of expertise across the Hadoop ecosystem.
Third, only Cloudera is comprehensively secure, offering unique encryption, fine-grained data access protection, and full audit and lineage capabilities.
Cloudera is the enterprise choice for leading global organisations who want to gain value from all their data.