HDFS and MapReduce
Driving ideas:
Data are delivered to programmes, not programmes to the data.
The Distributed File System (DFS)
is used to store data over the entire cluster:
The file system encompasses the entire cluster.
A single file's blocks are dispersed among the cluster.
For resilience, a particular block is frequently repeated.
HDFS architecture
• Master/Worker architecture
• Master: NameNode (NN):
▪ Manages the file system namespace and metadata:
▪ FsImage
▪ Edits Log
▪ Regulates client access to files.
• Worker: DataNode:
▪ Many per cluster.
▪ Manages storage that is attached to the nodes.
▪ Periodically reports its status to the NN.
HDFS blocks
Large files can be supported with HDFS.
Driving ideas:
Data are delivered to programmes, not programmes to the data.
The Distributed File System (DFS)
is used to store data over the entire cluster:
The file system encompasses the entire cluster.
A single file's blocks are dispersed among the cluster.
For resilience, a particular block is frequently repeated.
HDFS architecture
• Master/Worker architecture
• Master: NameNode (NN):
▪ Manages the file system namespace and metadata:
▪ FsImage
▪ Edits Log
▪ Regulates client access to files.
• Worker: DataNode:
▪ Many per cluster.
▪ Manages storage that is attached to the nodes.
▪ Periodically reports its status to the NN.
HDFS blocks
Large files can be supported with HDFS.