“The business case for big data is virtually unlimited, presenting new challenges and opportunities for ICT players bold enough to take on the challenge of supporting big data use,” says Grant Vine, technical director at Cybervine.
Big data, much like cloud computing, has been present in some form for quite a while, under various names such as analytics and BI. But much like cloud computing, the greatest differentiator is scale, which in turn has driven new thought processes on how to handle data of such magnitude.
There are already organisations running analytics and reporting against vast amounts of data. However, they may not necessarily be using big data thinking to handle this. One thing to bear in mind is that current big data products are primarily designed to look at static, historical data – the traditional approach to analytics.
But the landscape is changing fast. Companies are now starting to look to the potential uses of realtime analytics of big data, where frameworks are built for specific purposes, pulling data from multiple feeds for a realtime view of trending subjects or market sentiment – Open Source products such as Storm are at the forefront of realtime analytics of this nature.
There is unlimited potential for this information – traders, for example, would be aware of trends before SENS announcements were made. The impact of new product sets could be assessed immediately through the aggregation of multiple realtime feeds such as Pinterest, Twitter or Facebook – the concept of “trending” can be used in decisive processing.
The ability to use streams that were not used in the past – in realtime – would enable faster business decisions, could track significant world events, or be used as civil defence early warning systems. The possibilities are endless.
Realtime big data principles have the potential to catalyse massive changes in information analysis. Information is power, at the end of the day.
Supporting this new demand for high performance computing, to enable realtime big data processing and analysis, is where both the challenges and opportunities lie.
In South Africa, it is likely in the future that we will have to keep our data within the borders of the country. In addition, the cost of taking petabytes of data offshore and then making it readily available for accessing and use, could prove prohibitive.
In addition, there is the question of the systems in use to make big data processing and analysis most efficient and cost effective.
To prepare for big data analysis services, consumers need both unique architectures and cloud platforms that deliver elasticity to be made available within our borders. In this type of environment; data, processing and management are all separate – thus allowing unique control over the individual elements required.
Data is requested from the storage array when required, to be processed by multiple parallel processing nodes – lightweight virtual machines deployed as needed from a standardised template – with a controller node to manage the distribution of “jobs” and the source data locations to the processing nodes.
Elasticity implies the ability to expand the environment as rapidly as needed, but only when needed – with billing to be applied for only the actual processing time used. In this environment, the only pseudo-static cost that is applied is to the data storage itself – all analytical processing costs are only incurred when actually analysing the data.
The challenge is building these environments. It would take a significant investment to build an environment capable of this performance in South Africa, where the data volumes are currently limited and threats of an economic downturn are looming. Support of these environments is also an issue – specialised skills and resources will also be needed.
Another challenge is the question of how clients should be billed. Unlike existing hosted services, an elastic system such as this would have to be billed on a very granular level – a concept which is technically achievable, but difficult for the current South African-based “Cloud Vendors” to mould to their current sales strategies.
We have people in South Africa developing solutions around object-based storage and using services available from cloud providers such as Amazon Web Services, so some of the necessary skills around how a platform of this nature functions are already here, but local infrastructure is not.
However, the future is coming. The techies of today, who see the vast potential realtime big data management and the infrastructure to support it, are the CIOs of tomorrow. It will be a few years before they are in a position to influence the implementing, building and selling cloud orientated services to support big data. Once they do, the way we use and process information in South Africa could change forever.