Buffers and buffer heads
Buffers and Buffer Heads
When a block is stored in memory (say, after a read or pending a write), it is stored in
a buffer. Each buffer is associated with exactly one block. The buffer serves as the object
that represents a disk block in memory. Recall that a block comprises one or more sectors,
but is no more than a page in size. Therefore, a single page can hold one or more blocks in
memory. Because the kernel requires some associated control information to accompany
the data (such as from which block device and which specific block the buffer is), each buffer
is associated with a descriptor. The descriptor is called a buffer head and is of type struct
buffer_head. The buffer_head structure holds all the information that the kernel needs to
manipulate buffers and is defined in <linux/buffer_head.h>.
Take a look at this structure, with comments describing each field:
struct buffer_head {
unsigned long b_state; /* buffer state flags */
atomic_t b_count; /* buffer usage counter */
struct buffer_head *b_this_page; /* buffers using this page */
struct page *b_page; /* page storing this buffer */
sector_t b_blocknr; /* logical block number */
u32 b_size; /* block size (in bytes) */
char *b_data; /* buffer in the page */
struct block_device *b_bdev; /* device where block resides */
bh_end_io_t *b_end_io; /* I/O completion method */
void *b_private; /* data for completion method */
struct list_head b_assoc_buffers; /* list of associated mappings */
};
The b_state field specifies the state of this particular buffer. It can be one or more of the
flags in Table 13.1. The legal flags are stored in the bh_state_bits enumeration, which is
defined in <linux/buffer_head.h>.
Table 13.1. bh_state Flags
Status Flag Meaning
BH_Uptodate Buffer contains valid data
BH_Dirty Buffer is dirty (the contents of the buffer are newer than the contents of
the block on disk and therefore the buffer must eventually be written
,Table 13.1. bh_state Flags
Status Flag Meaning
back to disk)
BH_Lock Buffer is undergoing disk I/O and is locked to prevent concurrent access
BH_Req Buffer is involved in an I/O request
BH_Mapped Buffer is a valid buffer mapped to an on-disk block
BH_New Buffer is newly mapped via get_block() and not yet accessed
BH_Async_Read Buffer is undergoing asynchronous read I/O
via end_buffer_async_read()
BH_Async_Write Buffer is undergoing asynchronous write I/O
via end_buffer_async_write()
BH_Delay Buffer does not yet have an associated on-disk block
BH_Boundary Buffer forms the boundary of contiguous blocksthe next block is
discontinuous
The bh_state_bits enumeration also contains as the last value in the list
a BH_PrivateStart flag. This is not a valid state flag, but instead corresponds to the first
usable bit of which other code can make use. All bit values equal to and greater
than BH_PrivateStart are not used by the block I/O layer proper, so these bits are safe to use
by individual drivers who want to store information in the b_state field. Drivers can base the
bit values of their internal flags off this flag and rest assured that they are not encroaching
on an official bit used by the block I/O layer.
The b_count field is the buffer's usage count. The value is incremented and decremented by
two inline functions, both of which are defined in <linux/buffer_head.h>:
static inline void get_bh(struct buffer_head *bh)
{
atomic_inc(&bh->b_count);
}
static inline void put_bh(struct buffer_head *bh)
{
atomic_dec(&bh->b_count);
}
Before manipulating a buffer head, you must increment its reference count via get_bh() to
ensure that the buffer head is not deallocated out from under you. When finished with the
buffer head, decrement the reference count via put_bh().
, The physical block on disk to which a given buffer corresponds is the b_blocknr-th logical
block on the block device described by b_bdev.
The physical page in memory to which a given buffer corresponds is the page pointed to
by b_page. More specifically, b_data is a pointer directly to the block (that exists
somewhere in b_page), which is b_size bytes in length. Therefore, the block is located in
memory starting at address b_data and ending at address (b_data + b_size).
The purpose of a buffer head is to describe this mapping between the on-disk block and the
physical in-memory buffer (which is a sequence of bytes on a specific page). Acting as a
descriptor of this buffer-to-block mapping is the data structure's only role in the kernel.
Before the 2.6 kernel, the buffer head was a much more important data structure: It
was the unit of I/O in the kernel. Not only did the buffer head describe the disk-block-to-
physical-page mapping, but it also acted as the container used for all block I/O. This had two
primary problems. First, the buffer head was a large and unwieldy data structure (it has
shrunken a bit nowadays), and it was neither clean nor simple to manipulate data in terms
of buffer heads. Instead, the kernel prefers to work in terms of pages, which are simple and
allow for greater performance. A large buffer head describing each individual buffer (which
might be smaller than a page) was inefficient. Consequently, in the 2.6 kernel, much work
has gone into making the kernel work directly with pages and address spaces instead of
buffers. Some of this work is discussed in Chapter 15, "The Page Cache and Page Writeback,"
where the address_space structure and the pdflush daemons are discussed.
The second issue with buffer heads is that they describe only a single buffer. When used as
the container for all I/O operations, the buffer head forces the kernel to break up potentially
large block I/O operations (say, a write) into multiple buffer_head structures. This results in
needless overhead and space consumption. As a result, the primary goal of the 2.5
development kernel was to introduce a new, flexible, and lightweight container for block I/O
operations. The result is the bio structure, which is discussed in the next section.
The bio structure
The basic container for block I/O within the kernel is the bio structure, which is defined
in <linux/bio.h>. This structure represents block I/O operations that are in flight (active) as a
list of segments. A segment is a chunk of a buffer that is contiguous in memory. Thus,
individual buffers need not be contiguous in memory. By allowing the buffers to be
described in chunks, the bio structure provides the capability for the kernel to perform block
I/O operations of even a single buffer from multiple locations in memory. Vector I/O such as
this is called scatter-gather I/O.
Here is struct bio, defined in <linux/bio.h>, with comments added for each field:
struct bio {