Apache Hadoop and HDFS
An open source software system called Apache Hadoop is used for scalable,
distributed computing with enormous amounts of data:
conceals from the user the complexities and details of the underlying system.
created with Java.
• Consists of these subprojects:
▪ Hadoop Common
▪ HDFS
▪ Hadoop YARN
▪ MapReduce
▪ Hadoop Ozone
intended for diverse commodity hardware.
Google's work from the late 1990s and early 2000s, notably articles explaining the
Google File System (GFS) (released in 2003) and MapReduce, served as the
foundation for Hadoop (published in 2004).
Hadoop infrastructure: Large and constantly growing
The Hadoop architecture consists of additional parts that assist each stage of large
data processing in addition to the essential ones:
expanding continuously.
It consists of contributions from other businesses as well as Apache open source
initiatives.
• Hadoop-related projects:
▪ HBase
▪ Apache Hive
▪ Apache Pig
An open source software system called Apache Hadoop is used for scalable,
distributed computing with enormous amounts of data:
conceals from the user the complexities and details of the underlying system.
created with Java.
• Consists of these subprojects:
▪ Hadoop Common
▪ HDFS
▪ Hadoop YARN
▪ MapReduce
▪ Hadoop Ozone
intended for diverse commodity hardware.
Google's work from the late 1990s and early 2000s, notably articles explaining the
Google File System (GFS) (released in 2003) and MapReduce, served as the
foundation for Hadoop (published in 2004).
Hadoop infrastructure: Large and constantly growing
The Hadoop architecture consists of additional parts that assist each stage of large
data processing in addition to the essential ones:
expanding continuously.
It consists of contributions from other businesses as well as Apache open source
initiatives.
• Hadoop-related projects:
▪ HBase
▪ Apache Hive
▪ Apache Pig