A client recently complained that he was struggling with a reporting requirement because his technology platform could not cope with big data. Upon investigation it was found that he was dealing with approximately 20 million rows from a relatively small table, says Gary Allemann, MD of Master Data Management.
This is not what I would have considered to be big data. However, this raised the question – at what point does data become big data?
In a recent post, Robin Bloor asked this very question. Bloor’s observation was that current trends in the growth of data volumes are not new. Companies, in sectors such as telecommunications and retail, have historically managed significant volumes of data. He concluded that whether a company has a genuine big data requirement or not, they will still have to manage ever-increasing volumes of data.
It is essential to look at the impact of new technologies, such as Hadoop, which according to Wikipedia is an open-source software framework that supports data-intensive distributed applications.
It is also important to consider the complexities associated with managing not just large volumes of data, but a distributed architecture, including Hadoop components that are built for speed or to process unstructured data, as well as traditional database components.
Bloor concluded that if you are having problems managing your data, you have big data, irrespective of volume.
By this definition, many companies have big data issues today. Data management remains an afterthought for many companies, or is implemented at a tactical level that does not address enterprise complexities.
The complexities of big data require a more rigorous management approach if business benefits are to be realised.
There are four foundation parameters to exploit big data, namely:
* The ability to identify the right data to solve the problem;
* The ability to integrate and match varied data from multiple data sources;
* The necessary IT infrastructure to support big data initiatives; and
* Having the right capabilities and skills to exploit.
Of course, these are traditional pillars of data quality and data governance. Essentially, a big data strategy should ultimately form part of the overall data management strategy. Simply throwing technology at the problem will create unnecessary cost and complexity.