Architecture remains the fundamental issue at the core of realising the benefits of big data in organisations today – be they big or small enterprises, says Mervyn Mooi, director at Knowledge Integration Dynamics (KID). 

A problem with big data is that it requires volumes of complex data – data that’s typically in many different formats and often found all across organisations. It extends well beyond the traditional sales figures, employee data, product specifications, inventories, production schedules and the like. In fact, just dealing with big data can generate its own data that only adds to the pile.

With the need to integrate so many source systems – often in complete disparity – tools were needed and the industry turned to those that existed, which in the past comprised mostly extraction, transformation and loading (ETL) software. Those, however, are not designed for the demands of businesses today, chiefly scalability and flexibility coupled to low latency.

The underlying issue is that the systems that store data were often not designed to communicate with one another so they had to be jury-rigged together. However, with the advent of purpose-built tools that situation has reversed. The tools fit into the middle area between data sources, be they databases, CRM, ERP or other business systems – ergo the term middleware.

An issue that creates is latency – precisely the opposite of the desired effect. Adding a step to the middle of an existing process is bound to slow it down as well as add complexity. An effective solution is in-memory computing, particularly if the systems are connected in a grid format to provide elasticity and scalability on demand.

One of the most serious concerns this creates is that it can develop into a huge cost centre that experiences wasted resources, under-utilised resources, excess resource and the management and administrative overheads they imply.

Replicated, duplicated, overlapping processes and disparate models drive costs up and as data systems overtake business systems in size, complexity and cost, that can quickly result in an astronomical figure. Another curiosity of this situation is an ever-growing reliance on quick-win practices and projects that ultimately boil down to futility and wasted resources.

Architecture, therefore, is a crucial component of advancing any big data strategies, regardless of how good and efficient the underlying systems are at their specific jobs. Accessing, integrating, disseminating, verifying, transforming, qualifying and consuming or interpreting any data or information must be done based on standards or an architected approach.

If not, you run the risk of creating a spaghetti junction or hairball of crossed connections between data sources and systems that defeats the purpose of almost every data project in the past decade – a single version of the truth that leads to reliable and consistent information that businesspeople can use to keep the organisation running smoothly and keep it moving forward.