BIG DATA
What is Big Data?
Big Data is a collection of data that is huge in volume, yet growing
exponentially with time.
It is a data with so large size and complexity that none of traditional data
management tools can store it or process it efficiently.
Big data is also a data but with huge size.
Characteristics of Big Data
Big Data contains a large amount of data that is not being processed by
traditional data storage or the processing unit.
It is used by many multinational companies to process the data and
business of many organizations.
The data flow would exceed 150 exabytes per day before replication.
There are five v's of Big Data that explains the characteristics.
5 V's of Big Data
o Volume
o Veracity
o Variety
o Value
o Velocity
,Volume
The name Big Data itself is related to an enormous size. Big Data is a vast
'volumes' of data generated from many sources daily, such as business
processes, machines, social media platforms, networks, human
interactions, and many more.
Facebook can generate approximately a billion messages, 4.5 billion times
that the "Like" button is recorded, and more than 350 million new posts are
uploaded each day. Big data technologies can handle large amounts of
data.
,Variety
Big Data can be structured, unstructured, and semi-structured that are
being collected from different sources.
Data will be collected only from databases and sheets in the past, but these
days the data comes in form of arrays, that are PDFs, Emails, audios, SM
posts, photos, videos, etc.
The data is categorized as below:
a) Structured data: In Structured schema, along with all the required columns,
it is in a tabular form. Structured Data is stored in the relational database
management system.
b) Semi-structured: In Semi-structured, the schema is not appropriately
defined, e.g., JSON, XML, CSV, TSV, and email. OLTP (Online Transaction
Processing) systems are built to work with semi-structured data. It is stored
in relations, i.e., tables.
c) Unstructured Data: All the unstructured files, log files, audio files,
and image files are included in the unstructured data. Some organizations
have much data available, but they did not know how to derive the value of
data since the data is raw.
, Veracity
Veracity means how much the data is reliable. It has many ways to filter or
translate the data.
Veracity is the process of being able to handle and manage data efficiently.
Big Data is also essential in business development.
For example, Facebook posts with hashtags.
Value
Value is an essential characteristic of big data. It is not the data that we
process or store. It is valuable and reliable data that we store, process, and
also analyze.
Velocity
Velocity plays an important role compared to others. Velocity creates the
speed by which the data is created in real-time. It contains the linking of
incoming data sets speeds, rate of change, and activity bursts. The primary
aspect of Big Data is to provide demanding data rapidly.
Big data velocity deals with the speed at the data flows from sources
like application logs, business processes, networks, and social media sites,
sensors, mobile devices, etc.
What is Big Data?
Big Data is a collection of data that is huge in volume, yet growing
exponentially with time.
It is a data with so large size and complexity that none of traditional data
management tools can store it or process it efficiently.
Big data is also a data but with huge size.
Characteristics of Big Data
Big Data contains a large amount of data that is not being processed by
traditional data storage or the processing unit.
It is used by many multinational companies to process the data and
business of many organizations.
The data flow would exceed 150 exabytes per day before replication.
There are five v's of Big Data that explains the characteristics.
5 V's of Big Data
o Volume
o Veracity
o Variety
o Value
o Velocity
,Volume
The name Big Data itself is related to an enormous size. Big Data is a vast
'volumes' of data generated from many sources daily, such as business
processes, machines, social media platforms, networks, human
interactions, and many more.
Facebook can generate approximately a billion messages, 4.5 billion times
that the "Like" button is recorded, and more than 350 million new posts are
uploaded each day. Big data technologies can handle large amounts of
data.
,Variety
Big Data can be structured, unstructured, and semi-structured that are
being collected from different sources.
Data will be collected only from databases and sheets in the past, but these
days the data comes in form of arrays, that are PDFs, Emails, audios, SM
posts, photos, videos, etc.
The data is categorized as below:
a) Structured data: In Structured schema, along with all the required columns,
it is in a tabular form. Structured Data is stored in the relational database
management system.
b) Semi-structured: In Semi-structured, the schema is not appropriately
defined, e.g., JSON, XML, CSV, TSV, and email. OLTP (Online Transaction
Processing) systems are built to work with semi-structured data. It is stored
in relations, i.e., tables.
c) Unstructured Data: All the unstructured files, log files, audio files,
and image files are included in the unstructured data. Some organizations
have much data available, but they did not know how to derive the value of
data since the data is raw.
, Veracity
Veracity means how much the data is reliable. It has many ways to filter or
translate the data.
Veracity is the process of being able to handle and manage data efficiently.
Big Data is also essential in business development.
For example, Facebook posts with hashtags.
Value
Value is an essential characteristic of big data. It is not the data that we
process or store. It is valuable and reliable data that we store, process, and
also analyze.
Velocity
Velocity plays an important role compared to others. Velocity creates the
speed by which the data is created in real-time. It contains the linking of
incoming data sets speeds, rate of change, and activity bursts. The primary
aspect of Big Data is to provide demanding data rapidly.
Big data velocity deals with the speed at the data flows from sources
like application logs, business processes, networks, and social media sites,
sensors, mobile devices, etc.