Do You Really Know What Is Big Data?

The term Big Data has become increasingly popular, although it is still poorly understood. I observe in many lectures that there is no consensus as to what really is Big Data and what are the key technologies that support it. What’s more, there are many doubts because it makes the concept tangible, i.e., how to get out of this conceptual and create business solutions that add value to the companies.

It is essential to eliminate these doubts and the first step for companies is to venture into Big Data projects.

To put the term in context, Big Data has been calling attention for accelerated scale at which increasing volumes of data are created by society. We have spoken commonly about petabytes of data generated every day, and zetabytes is becoming a real scale and not imaginary and futuristic. What was the future for a decade, terabytes, today we already have in our own homes.

The technologies that support Big Data can be analyzed from two points of view: those involved with analytics, and Hadoop and MapReduce as names and major infrastructure technologies, which store and process petabytes of data. In this respect, we highlight the NoSQL databases (No, means not only SQL). Why are these technologies? Big Data is the simple fact that practice immense volume of data generated every day, exceeds the capacity of current technologies to treat them properly.

Starting at the beginning…

What is Big Data? Big Data is = volume + speed + range + truth and value.

We will detail these topics a little more.

Volume is unclear. We generate petabytes of data every day. And it is estimated that this volume doubles every 18 months. Variety also because these data come from structured systems (now minority) and unstructured (the vast majority), generated by emails, social media (Facebook, Twitter, YouTube and others), electronic documents, PowerPoint presentations style, instant messaging , sensors, RFID tags, video cameras, etc..

Speed, because we often need to act in near real time on this huge volume of data, as in automatic control of traffic in the streets. Veracity because we need to make sure that the data make sense and are authentic. And because value is absolutely necessary, organization implements projects of big data to get return on these investments. An example might be a safe area where analysis of fraud could be greatly improved, minimizing risks by using, for example, data analysis bases that are outside the insurance structured as data is daily circulating in social media.

We speak about the current technologies for data processing that are no longer adequate. Why? Consider the relational model proposed by IBM researcher, Edgar F. Codd in 1969. When it was proposed, the demand was accessing structured data, generated by internal systems of corporations. It was designed for unstructured data (futurology at the time) and not to volumes in the house of petabytes of data (unimaginable at the time). A model that categorizes and normalizes data with ease was needed. And the relational model was very successful at it, so the data model is most widely used today.

To handle data on the scale of volume, variety and velocity of Big Data needs other models. Thus emerged software NoSQL database, designed to handle massive volumes of structured and unstructured data. There are many different systems as columnar as the Big Table, used internally by Google (is the database under Google App Engine). In summary, there are many options … Interesting to remember that before there was already a relational model database software that dealt with large data volumes that IMS is IBM’s hierarchical model, designed to support the achievement of project Apollo moon, and that still is the basis of most financial transactions that flow through the world.

Moreover, this diversity of alternative demand is that, leaders of Big Data projects choose the most appropriate solution or even, they may require more than one option, according to specific needs.

Additionally, we think that cloud computing is also a driver for Big Data because we can use public clouds to support huge volumes of data and the characteristics of elasticity of clouds.

Anyway, Big Data is already knocking on our doors. Its potential is not being fully recognized, but we already see clear signs of its importance when we read a lot about “Big Data”.

For businesses, Big Data opens a new and unexplored territory.  When we talk about Knowledge, experience and professional expertise, it is still lacking. It is inevitable that CIOs will have to put Big Data in their radar screen.

The opportunities coming from Big Data should not be wasted.


Leave a Reply

Follow by Email