Hashing
Hashing can be used to build, search, or delete from a table.
The basic idea behind hashing is to take a field in a record, known as the key, and
convert it through some fixed process to a numeric value, known as the hash key,
which represents the position to either store or find an item in the table.
The numeric value will be in the range of 0 to n-1, where n is the maximum
number of slots (or buckets) in the table.
The fixed process to convert a key to a hash key is known as a hash function. This
function will be used whenever access to the table is needed.
One common method of determining a hash key is the division method of
hashing. The formula that will be used is:
hash key = key % number of slots in the table
, The division method is generally a reasonable strategy, unless the key happens to
have some undesirable properties. For example, if the table size is 10 and all of the
keys end in zero.
In this case, the choice of hash function and table size needs to be carefully
considered. The best table sizes are prime numbers.
One problem though is that keys are not always numeric. In fact, it's common for
them to be strings.
One possible solution: add up the ASCII values of the characters in the string to get
a numeric value and then perform the division method.
No matter what the hash function, there is the possibility that two keys could
resolve to the same hash key. This situation is known as a collision.
When this occurs, there are two simple solutions:
1. chaining
2. linear probe (aka linear open addressing)
And two slightly more difficult solutions
3. Quadratic Probe
4. Double Hashing
Hashing can be used to build, search, or delete from a table.
The basic idea behind hashing is to take a field in a record, known as the key, and
convert it through some fixed process to a numeric value, known as the hash key,
which represents the position to either store or find an item in the table.
The numeric value will be in the range of 0 to n-1, where n is the maximum
number of slots (or buckets) in the table.
The fixed process to convert a key to a hash key is known as a hash function. This
function will be used whenever access to the table is needed.
One common method of determining a hash key is the division method of
hashing. The formula that will be used is:
hash key = key % number of slots in the table
, The division method is generally a reasonable strategy, unless the key happens to
have some undesirable properties. For example, if the table size is 10 and all of the
keys end in zero.
In this case, the choice of hash function and table size needs to be carefully
considered. The best table sizes are prime numbers.
One problem though is that keys are not always numeric. In fact, it's common for
them to be strings.
One possible solution: add up the ASCII values of the characters in the string to get
a numeric value and then perform the division method.
No matter what the hash function, there is the possibility that two keys could
resolve to the same hash key. This situation is known as a collision.
When this occurs, there are two simple solutions:
1. chaining
2. linear probe (aka linear open addressing)
And two slightly more difficult solutions
3. Quadratic Probe
4. Double Hashing