The first thing i would like to say before writing anything about Big Data is that it is not new.
Big data is generated by Machine, Organisation and people. and it’s everywhere. Most of the big data sources existed before, but the scale we use and apply them today has changed. Just look at this image of open link data on the Internet. It shows not only there are so many sources of data, but they’re also connected.
Big data is often boiled down to a few varieties of data generated by machines, people, and organizations.
With machine generated data we refer to data generated from real time sensors in industrial machinery or vehicles that logs that track user behavior online, environmental sensors or personal health trackers, and many other sense data resources. With human generated data, we refer to the vast amount of social media data, status updates, tweets, photos, and medias. With organisational generated data we refer to more traditional types of data, including transaction information in databases and structured data open stored in data warehouses.
Big data can be either structured, semi-structured, or unstructured.
Let’s take one example of Machine generated data, for example: AirPlane. Does Big Plane require Big Data? Yes. Did you know that a Boeing 787 produces half a terabyte of data every time it flies? Yes. Almost every part of the plane updates both the flight and the ground team about its status constantly. This is an example of machine-generated data coming from sensors.
Let’s take another example.What makes a smart device, smart? Basically, There are three main properties of smart devices based on what they do with sensors and things they encapsulate. They can connect to other devices or networks, they can execute services and collect data autonomously, that means on their own, they have some knowledge of the environment.
The availability of the smart devices and their interconnectivity led to a new term, The Internet of Things.Think of a world of smart devices at home, in your car, in the office,city, remote rural areas, the sky, even the ocean,all connected and all generating data.
If we take our first example of Airplane. Data is generated from accelerometers that measure turbulence.There are also other sensors built into the engines for temperature, pressure,many other measurable factors to detect engine malfunctions.
Constant real-time analysis of all the data collected provides help monitoring and problem detection at 40,000 feet. That’s approximately 12,000 meters above ground where plane is flying.
We call this type of analytical processing in-situ. Previously, in traditional relational database management systems, data was often moved to computational space for processing. In Big Data space in-Situ means bringing the computation to where data is located or generated.
A key feature of these types of real-time notifications is that they enable real-time actions. This is what BIG DATA help!
People are generating massive amounts of data every day through their activities on various social media networking sites like Facebook, Twitter, and LinkedIn Or online photo sharing sites like Instagram, Flickr, or Picasa.And video sharing websites like YouTube.In addition an enormous amount of information gets generated via blogging and commenting, internet searches, more via text messages,email, and through personal documents.
Most of this data is text-heavy and unstructured,that is non-conforming to a well-defined data model. Humans generate a lot of unstructured data in form of text.There’s no given format to that. For example Look at all the documents that you have written with your hand so far. 80 to 90% of all data in the world is unstructured and this number is rapidly growing.
Confirmation of unstructured data is often time consuming and costly.The costs and time of the process of acquiring, storing, cleaning, retrieving, and processing unstructured data can add up to quite and investment before we can start reaping value from this process.The challenges of working with unstructured data should not be taken lightly.
For Example, Well structure bill is easily identify and we can extract data easily. Where handwritten bill is not same everywhere. Text recognition, identification, identifying format, etc. quite challenging in unstructured handwritten human generated bill.
Big data tools are designed from scratch to manage unstructured information and analyze it. A majority of these tools are based on an open source big data framework called Hadoop.
Hadoop can handle big batches of distributed information. Storm and Spark are two other open source frameworks
that handle such real time data generated at a fast rate.For example, a real time processing of people generated data like Twitter or Facebook updates.
Now the Question is, how do today’s businesses get around this problem?
Many businesses today are using a hybrid approach in which their smaller structured data remains in their relational databases, and large unstructured datasets get stored in NoSQL databases in the cloud.NoSQL Data technologies are based on non-relational concepts.
Did You Know? →
How much Twitter data companies analyse every day to measure sentiment (Sentiment analysis analyses social media and other data to find whether people associate positively or negatively with your business) around their product?
The answer is 12 terabytes a day. For comparison, you would need to listen continuously for two years to finish listening to 1 terabyte of music. Yes, You heard right!