Kenneth Lim, Cisco IT Site Leader for ASEAN
Hadoop has evolved into becoming one of the predominant tools in the enterprise Big Data industry, making it possible for companies to gain the maximum value from their data. In many instances, Hadoop saves businesses money, provides scale and speed and naturally has seen its adoption exploding.
Hadoop makes it possible to run applications on systems with thousands of nodes involving thousands of terabytes. Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating uninterrupted in case of a node failure. This approach lowers the risk of catastrophic system failure, even if a significant number of nodes become inoperative.
Today, Hadoop framework is just beginning to edge into regular enterprise operations in many companies, and some Hadoop best practices for production are only now arising.
In DataStorageAsean’s “Hadoop in a Hurry” executive interview we looked into the Hadoop opportunities and challenges for IT vendors in the ASEAN market.
In this first installment of the Hadoop in a Hurry series, we talked to Kenneth Lim, Cisco IT Site Leader for ASEAN where he shared, among others, how developing Hadoop as part of an overall information plan allows Cisco to more effectively support use cases for Hadoop that go beyond the first few original test cases and to manage Hadoop so that the conversation with business leaders is about getting value out of the data, not about code patches and node failures.
DataStorageAsean: In simple terms what’s the difference between Hadoop and a Normal Database?
Kenneth Lim: It is good that we establish the differences between Hadoop and a standard database in order to understand the concept of Big Data.
In a normal database, data is stored in the form of tables which are made up of column and rows and the simplest way to extract the data will be through Structured Query Language (SQL). As the name implies, the database stores only structured data e.g. a column allows a string of character with maximum length of 255 to be added. These attributes give structure and integrity to the data that will be added.
For Hadoop, it is an open source distributed file system that supports storing and processing large amounts of data of any kind (structured or unstructured) quickly. It does not pre-process the data before storing it so we can store as much data as we want and decide on its use later on. And there is a wide variety of ways to access the data from batch to real-time access.
DataStorageAsean: Can you tell us about the newest developments in Hadoop and where you think the technology is heading?
Kenneth Lim: We will see Hadoop projects mature and those enterprises who are already using it, will plan on doing more with it. Hadoop will become a core part of the enterprise IT landscape and investment will grow in the security areas, with more self-serviced analytics and data being collected.
DataStorageAsean: Is Hadoop for all companies of any size and can anyone in an organisation get access to data held in a Hadoop cluster?
Kenneth Lim: Generally, yes and it really depends on your business use case in implementing Hadoop. The National Library Board Singapore, for example, used Hadoop to analyse and optimise its network of libraries nationwide and examines the use of these libraries.
Security is provided at every layer of Hadoop stack via Apache Knox and this provides a single point of authentication and access.
DataStorageAsean: What are the specific challenges for Hadoop adoption in the ASEAN region?
Kenneth Lim: First of all, there are legal and regulatory challenges associated with data. There are different data protection laws across ASEAN, and it can be tricky to navigate around them. Within ASEAN, only Singapore and Malaysia have data protection laws. Furthermore, consent of data collection must be expressly given in some countries. In addition to these, locations of data storage give different interpretation of what data can be used or how it should be stored e.g. personal data can only be stored within the country. This means that enterprises cannot centralise their data in datacentres.
The other challenge will be around data quality issue. We are at different rates of development across ASEAN and may have our own unique languages. Hence datasets kept may differ, which affects the quality of data.
DataStorageAsean: What’s unique about your Hadoop offering?
Kenneth Lim: The Cisco UCS Common Platform Architecture for Big Data with MapR delivers an integrated Hadoop solution that is specifically engineered to handle the most demanding big data workloads. It provides a highly scalable platform that can be optimised and easily scaled for any size of Hadoop cluster. With Cisco UCS manager, there is an unified embedded management of all computing, networking and storage-access resources.