HDFS
Standby NameNode
The primary NN is unable to merge the edits log and the fsimage while it is in use.
On the secondary NN, this task is carried out:
The secondary NN copies a fresh edit log from the original NN every few minutes.
Log in to fsimage to merge the changes.
returns the combined fsimage to the main NN.
provides a quicker startup time but is not HA:
Any in-flight transactions are lost since Standby NN lacks a complete image.
During startup, primary NN requires less merging.
The HDFS file system data is kept by NN in a file with the name fsimage. When
adding or removing blocks from the file system, the I/O is fast-append streaming
exclusively as opposed to random file writes, hence the fsimage file is not updated.
NN examines fsimage before applying all the changes from the log file to restart
the file system, updating the file system state in memory. This operation requires
time.
During startup, primary NN requires less merging.
The secondary NN's only responsibility is to regularly read the file system changes
log, apply the changes to the fsimage file, and keep NN up to date. Its role is not to
serve as NN's secondary. The subsequent start will be quicker thanks to this task.
Federated NameNode (HDFS)
• New in Hadoop V2: NNs can be federated:
▪ Historically, NNs can become a bottleneck on huge clusters.
▪ One million blocks or ~100 TB of data require roughly 1 GB of RAM
in an NN.
Standby NameNode
The primary NN is unable to merge the edits log and the fsimage while it is in use.
On the secondary NN, this task is carried out:
The secondary NN copies a fresh edit log from the original NN every few minutes.
Log in to fsimage to merge the changes.
returns the combined fsimage to the main NN.
provides a quicker startup time but is not HA:
Any in-flight transactions are lost since Standby NN lacks a complete image.
During startup, primary NN requires less merging.
The HDFS file system data is kept by NN in a file with the name fsimage. When
adding or removing blocks from the file system, the I/O is fast-append streaming
exclusively as opposed to random file writes, hence the fsimage file is not updated.
NN examines fsimage before applying all the changes from the log file to restart
the file system, updating the file system state in memory. This operation requires
time.
During startup, primary NN requires less merging.
The secondary NN's only responsibility is to regularly read the file system changes
log, apply the changes to the fsimage file, and keep NN up to date. Its role is not to
serve as NN's secondary. The subsequent start will be quicker thanks to this task.
Federated NameNode (HDFS)
• New in Hadoop V2: NNs can be federated:
▪ Historically, NNs can become a bottleneck on huge clusters.
▪ One million blocks or ~100 TB of data require roughly 1 GB of RAM
in an NN.