Since its public introduction in 2008 by Nature Publishing Group editors, the term “big data” has become a buzz-word across the industry, says Dmitry Aleshin, head of Endpoint Protection, deputy CTO (Products), Kaspersky Lab.

Although this technology still remains hidden in the shadow of bigger problems that CIOs have to solve in everyday life, the widely adopted approach of handling vast amounts of data, dubbed with the name of Hadoop, has gained quite a momentum. While the majority of IT specialists concentrate on such issues as operations, availability and implementation, not many think of the associated security risks.

Here are few things to consider:
The need for big data solutions came out of realising the necessity of dealing with huge amounts of information, which is coming from different sources, at a different pace, and in different formats.

For example, imagine a large international company, with distributed production facilities, marketing, sales and R&D branches. Each branch office generates dozens of reports every day; the corporate data centres are flooded with massive amounts of technical documentation, procurement documents, and the like.

Due to strict categorisation rules and good managerial skills, everything is nicely arranged within every given sector of responsibility, but getting the wider context of understanding from this structured information and data is quite a challenge.

However – does one really need it? To answer this question, the scale of business and the depth of knowledge about the business processes running in parallel should be considered. For enterprises, operating with thousands of nodes and millions of documents, the struggle for few extra per cents in business efficiency, given the scale of operation, may be more than worth the effort.

Additionally, don’t forget that Hadoop is an open-source platform, which was initially conceived and developed by Internet companies, for the goal of simplifying the page rank calculations of the indexed Web pages.

This technology was developed with little, or no secure thinking, for example, relying on underlying Unix authentication system for estimating which users are submitting the tasks to name nodes, or allowing the retrieval of data blocks from data nodes via unsecure HTTP connections.

Yes, these data blocks are spread and balanced between the physical drives by the distributed file system, but assembling them together is not something exceptionally hard to do, especially when the reference designs and whitepapers, designed initially to support the adoption of the technology, are currently available to a wide audience, including the likes of cybercriminals.

Since data storage is one of the most fault prone parts of the big data ecosystem, it must be noted by businesses that requiring multiple reservation special solutions for data protection is essential. Without such a solution, the whole effort of gaining bits of extra knowledge to have the competitive advantage of the company would be hindered or even lost. Kaspersky Security for Storage which scans the stored files for malicious software is such a solution that offers this protection required.