INDEX
1. Introduction................................................................................................................. 2-3
1.1 What is Big Data?
1.2 The History of Big Data
1.3 Benefits of Big Data and Data Analytics
2. Types.................................................................................................................................. 3-5
2.1 Characterstics
2.2 Why is Big Data Important?
3. Architecture................................................................................................................ 5-10
4. Technology…………………………………………………………………………………………………………11-16
4.1 Introduction to Big Data Technologies
4.2 Types of Big Data Technologies
5. Applications................................................................................................................................ 17-24
6. Conclusion.......................................................................................................................................... 25
7. Reference............................................................................................................................................. 25
1
,1.1 What is Big Data?
According to Gartner, the definition of Big Data –
“Big data” is high-volume, velocity, and variety information assets that demand
cost-effective, innovative forms of information processing for enhanced insight and
decision making.”
This definition clearly answers the “What is Big Data?” question – Big Data refers to
complex and large data sets that have to be processed and analyzed to uncover
valuable information that can benefit businesses and organizations.
However, there are certain basic tenets of Big Data that will make it even simpler to
answer what is Big Data:
1 It refers to a massive amount of data that keeps on growing exponentially with time.
2 It is so voluminous that it cannot be processed or analyzed using conventional data
processing techniques.
3 It includes data mining, data storage, data analysis, data sharing, and data
visualization.
4 The term is an all-comprehensive one including data, data frameworks, along with
the tools and techniques used to process and analyze the data.
1.2 The History of Big Data
Although the concept of big data itself is relatively new, the origins of large data sets
go back to the 1960s and '70s when the world of data was just getting started with the
first data centers and the development of the relational database.
Around 2005, people began to realize just how much data users generated through
Facebook, YouTube, and other online services. Hadoop (an open-source framework
created specifically to store and analyze big data sets) was developed that same year.
NoSQL also began to gain popularity during this time.
The development of open-source frameworks, such as Hadoop (and more recently,
Spark) was essential for the growth of big data because they make big data easier to
work with and cheaper to store. In the years since then, the volume of big data has
skyrocketed. Users are still generating huge amounts of data—but it’s not just
humans who are doing it.
With the advent of the Internet of Things (IoT), more objects and devices are
connected to the internet, gathering data on customer usage patterns and product
performance. The emergence of machine learning has produced still more data.
2
, While big data has come far, its usefulness is only just beginning. Cloud computing
has expanded big data possibilities even further. The cloud offers truly elastic
scalability, where developers can simply spin up ad hoc clusters to test a subset of
data.
1.3 Benefits of Big Data and Data Analytics
1. Big data makes it possible for you to gain more complete answers because
you have more information.
2. More complete answers mean more confidence in the data—which means a
completely different approach to tackling problems.
2. Types of Big Data
Now that we are on track with what is big data, let’s have a look at the types of big data:
1. Structured
Structured is one of the types of big data and By structured data, we mean data that
can be processed, stored, and retrieved in a fixed format. It refers to highly
organized information that can be readily and seamlessly stored and accessed from
a database by simple search engine algorithms. For instance, the employee table
in a company database will be structured as the employee details, their job
positions, their salaries, etc., will be present in an organized manner.
2. Unstructured
Unstructured data refers to the data that lacks any specific form or structure
whatsoever. This makes it very difficult and time-consuming to process and
analyze unstructured data. Email is an example of unstructured data. Structured and
unstructured are two important types of big data.
3. Semi-structured
Semi structured is the third type of big data. Semi-structured data pertains to the
data containing both the formats mentioned above, that is, structured and
unstructured data. To be precise, it refers to the data that although has not been
classified under a particular repository (database), yet contains vital information or
tags that segregate individual elements within the data. Thus we come to the end of
types of data.
3
1. Introduction................................................................................................................. 2-3
1.1 What is Big Data?
1.2 The History of Big Data
1.3 Benefits of Big Data and Data Analytics
2. Types.................................................................................................................................. 3-5
2.1 Characterstics
2.2 Why is Big Data Important?
3. Architecture................................................................................................................ 5-10
4. Technology…………………………………………………………………………………………………………11-16
4.1 Introduction to Big Data Technologies
4.2 Types of Big Data Technologies
5. Applications................................................................................................................................ 17-24
6. Conclusion.......................................................................................................................................... 25
7. Reference............................................................................................................................................. 25
1
,1.1 What is Big Data?
According to Gartner, the definition of Big Data –
“Big data” is high-volume, velocity, and variety information assets that demand
cost-effective, innovative forms of information processing for enhanced insight and
decision making.”
This definition clearly answers the “What is Big Data?” question – Big Data refers to
complex and large data sets that have to be processed and analyzed to uncover
valuable information that can benefit businesses and organizations.
However, there are certain basic tenets of Big Data that will make it even simpler to
answer what is Big Data:
1 It refers to a massive amount of data that keeps on growing exponentially with time.
2 It is so voluminous that it cannot be processed or analyzed using conventional data
processing techniques.
3 It includes data mining, data storage, data analysis, data sharing, and data
visualization.
4 The term is an all-comprehensive one including data, data frameworks, along with
the tools and techniques used to process and analyze the data.
1.2 The History of Big Data
Although the concept of big data itself is relatively new, the origins of large data sets
go back to the 1960s and '70s when the world of data was just getting started with the
first data centers and the development of the relational database.
Around 2005, people began to realize just how much data users generated through
Facebook, YouTube, and other online services. Hadoop (an open-source framework
created specifically to store and analyze big data sets) was developed that same year.
NoSQL also began to gain popularity during this time.
The development of open-source frameworks, such as Hadoop (and more recently,
Spark) was essential for the growth of big data because they make big data easier to
work with and cheaper to store. In the years since then, the volume of big data has
skyrocketed. Users are still generating huge amounts of data—but it’s not just
humans who are doing it.
With the advent of the Internet of Things (IoT), more objects and devices are
connected to the internet, gathering data on customer usage patterns and product
performance. The emergence of machine learning has produced still more data.
2
, While big data has come far, its usefulness is only just beginning. Cloud computing
has expanded big data possibilities even further. The cloud offers truly elastic
scalability, where developers can simply spin up ad hoc clusters to test a subset of
data.
1.3 Benefits of Big Data and Data Analytics
1. Big data makes it possible for you to gain more complete answers because
you have more information.
2. More complete answers mean more confidence in the data—which means a
completely different approach to tackling problems.
2. Types of Big Data
Now that we are on track with what is big data, let’s have a look at the types of big data:
1. Structured
Structured is one of the types of big data and By structured data, we mean data that
can be processed, stored, and retrieved in a fixed format. It refers to highly
organized information that can be readily and seamlessly stored and accessed from
a database by simple search engine algorithms. For instance, the employee table
in a company database will be structured as the employee details, their job
positions, their salaries, etc., will be present in an organized manner.
2. Unstructured
Unstructured data refers to the data that lacks any specific form or structure
whatsoever. This makes it very difficult and time-consuming to process and
analyze unstructured data. Email is an example of unstructured data. Structured and
unstructured are two important types of big data.
3. Semi-structured
Semi structured is the third type of big data. Semi-structured data pertains to the
data containing both the formats mentioned above, that is, structured and
unstructured data. To be precise, it refers to the data that although has not been
classified under a particular repository (database), yet contains vital information or
tags that segregate individual elements within the data. Thus we come to the end of
types of data.
3