Up until Big Data, datacentre design was focused on availability, reliability and efficiency. It could be argued that Big Data was the trigger for the rise to importance of scalability. Big Data is forcing organisations to recognise the difficulty of predicting just how much storage is needed in the future.
Raju Chellam, head of Big Data and Cloud, South Asia at Dell says that “by its very nature, Big Data is about analytics using all kinds of data. That includes structured data in traditional databases, data marts with unstructured data in housed in networked storage as well as on public or private clouds.”
Chellam does concede that Big Data is not just storage-intensive, it’s also compute-intensive because much of the analytics reside in-memory. The other impacted is the network given that Hadoop replicates all workloads 3x and fans it to different clusters.
With 20 percent of the world’s Internet traffic coursing through its global network infrastructure, Tata Communications knows all too well the impact of Big Data on the datacentre. Srinivasan CR, vice president for Global Product Management, Data Centre Services at Tata Communications says that “collecting data from different data generation points in the IT infrastructure requires the right kind of networking infrastructure to ensure that data can be collected in real-time. Big Data would bring along with it related business intelligence applications and that would require appropriate infrastructure.”
He also argues that the advent of Big Data brings with it the added task of newer skills in the areas of storage, network and related applications. Also, security and integrity of the data that is being collected, processed, analysed and managed requires clear policies and procedures around information management and assurance.
Robert Chu, Vice President & General Manager, Storage APJ at Hewlett-Packard, sums it all up nicely “the rewards of Big Data are restricted due to the magnitude of the data being created, the flash flooding of database systems, the time consuming nature of traditional data integration methods, and the inability to properly secure this data. Enterprises are further challenged by the escalating demands required by the line of business for real-time analytical insights.”
Srinivasan estimated that 90 percent of the world’s data has been created in the last two years alone with personal data soon expected to get ahead of business data in volume terms in the next two to three years. With Internet of Things and the estimation of Gartner that 26 billion devices will be generating data on the network by 2020 there is a whole lot of data that the world in general will have to deal with in the coming years.
“Enterprises will be a part of this challenge as the amount of data that they would need to manage for sustaining business and grow it would increase dramatically. Smarter analytics and understanding of customer behaviour would be key to maintaining growth and this would mean that enterprises have the right kind of data sets and derive useful business intelligence out of them,” according to Srinivasan.
Source: Cisco Systems forecasts that annual datacenter traffic will reach 6.6 zettabytes (1021 bytes) by the end of 2016 with a CAGR of 31% from 2011 to 2016. The chart uses exabytes (1 exabyte = 1018 bytes) on its Y axis (Cisco Systems Global Cloud Index).
He believes that enterprises would need to clearly manage data growth and also be able to decide on data to be collected, retained, processed, analysed and discarded. Once the data requirements are clearly understood then the enterprise can decide on the next steps for a Big Data solution.
So how does an enterprise plan for Big Data when it is difficult to predict with accuracy the amount of data you will generate or accumulate, and as a consequence, the datacentre from where the action will take place?
Chellam suggests that enterprises that plan for Big Data take an integrated approach for best results. Ideally, solutions should provide an open source, end-to-end scalable infrastructure that enables users to:
Chu also adds that organisations need secure solutions that can scale while sourcing data at high speed. This brings us to the one constant concern about putting data in the cloud: data security. Vendors are taking great pains to ease user and enterprise concerns about the security of the cloud and compliance to regulatory issues (where it applies).
Does taking a cloud approach mitigate the risks of running out of storage?
Proponents of cloud computing argue that one of the benefits of moving data to the cloud is the near limitless capacity of the cloud to house your data.
”The cloud approach to storage provides a viable option for enterprises to scale as per demand. However, any IT infrastructure and more so for storage would require careful planning and capacity management to ensure that there is enough space for critical data,” cautions Srinivasan.
He goes on to remind us that it’s important for enterprise customers to take a hard look at their data and understand what is relevant for their core business. “It is also important for them to understand what sets of data are being used to generate the intelligence that they need for their business. Performance requirements would also dictate where data is stored and what type of storage is used for storing it,” he further adds.
A blanket solution to Big Data is not the most cost effective option and may not address all of the business requirements. It would be best for a big data solution to evolve from existing architecture rather than being a separate solution. Big Data needs a lot of planning, investments and ongoing management and hence, taking some baby steps to get to the final solution would be a better path.
Dell’s Chellam believes that the right approach is all cases would be to start with a proof of concept, get small wins, get the internal team trained on areas such as Linux, Hadoop, R, and other data analytics tools, and scale out on demand. “Changing the datacentre is expensive, complex and complicated. Instead, it would be better to ensure the big data stack is well architected with application programming interface (APIs) to connect to existing apps, database, data warehouse and data,” he suggests.
HP’s Chu opines that information is the core of any business or government. However, the real customer challenge around Big Data isn’t just the traditional volume increase in information. “The challenge is around gaining command over the volume as well as the variety, velocity and vulnerability of data. The industry is seeing massive capacity growth in unstructured and semi-structured data – types of information that organizations have not fully tapped into, but that are becoming increasingly vital to the enterprise decision-making process,” he concludes.