Some reading this may be too young to remember the Kevin Costner movie, Field of Dreams where Costner’s character hears a voice tell him, “If you build it, they will come”. In the movie, the voice is referring to a baseball field and some long-departed baseball players but it’s surprising how that concept holds true in real-world situations.
Take traffic and road building, town planners have long been aware of the concept of “induced traffic”. If you build new roads to ease congestion, ironically, over time, more cars are attracted to these roads and congestion gets worse. The answer is integrated transport planning.
Data is no different. The term data gravity was coined in 2010 by a now, ever to be remembered software engineer called Dave McCrory. In a blog post, he described his data gravity concept. Here is a paraphrased outline of what Dave described:
“Data gravity is the ability of bodies of data to attract applications, services and other data. The force of gravity, in this context, is manifested in the way software, services and business logic are drawn to data relative to its mass (the amount of data) and as a result, are physically located closer to the data”.
The concept makes a lot of sense. Where you make data available at scale, no doubt smart data scientists, analysts, marketers and the like will be drawn to the data, like prospectors to California during the gold rush of the 1800s.
When McCrory came up with his definition, he didn’t just talk about the pull of data gravity. He also suggested that as the body of data grows bigger, it becomes more “immoveable”, requiring centralised infrastructure to “house it” and drawing those who want access to it to be physically closer as distance would cause latency and delays in generating insights.
Enter Cloudera with an enterprise data cloud that defies the laws of physics. Well, at least the metaphorical physics of data gravity!
Today’s demand for instant analytics at scale leads to two imperatives that are held back by data gravity.
The first is the fact that we need to collect data from the edge – i.e. from devices and remote locations. Not only does this data need to be collected but it needs to be analysed at the edge in real-time, sometimes pulling data from the core to enhance this “local” analysis.
The second is that the speed at which analytic processes need to be carried out is accelerating rapidly. It is no longer good enough to send data to a central store and then do historical point-in-time analysis. Data has to be analysed in transit and real-time, often from many locations instantaneously.
The only way to make this happen is to break one of the shackles of data gravity, namely the idea of all data being placed in one immovable location. Instead, your body of data can be geographically dispersed and fluid, changing in real-time.
Cloudera’s Enterprise Data Cloud makes this possible. You can find out more about it here.
One thing does not change, however. The body of data may no longer be shackled by location but the virtualised lake of data that Cloudera enables you to collate still meets the idea of “if you build it, they will come” and in-fact it accelerates it. Once you put data at scale on Cloudera’s cloud-based data platform, it becomes easier for more people to access and use. The net result is the gravitational pull for more data-driven applications gets even stronger.