Hash table

In computing, a hash table (hash map) is a data structure that implements an associative array abstract data type, a structure that can map keys to values. A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored.

Ideally, the hash function will assign each key to a unique bucket, but most hash table designs employ an imperfect hash function, which might cause hash collisions where the hash function generates the same index for more than one key. Such collisions are typically accommodated in some way.

In a well-dimensioned hash table, the average cost (number of instructions) for each lookup is independent of the number of elements stored in the table. Many hash table designs also allow arbitrary insertions and deletions of key�value pairs, at (amortized) constant average cost per operation.

Hashing is an example of a space-time tradeoff. If memory is infinite, the entire key can be used directly as an index to locate its value with a single memory access. On the other hand, if time is infinite, values can be stored without regard for their keys, and a binary search or linear search can be used to retrieve the element.The advantage of using hashing is that the table address of a record can be directly computed from the key. Hashing implies a function {\displaystyle h}h, when applied to a key {\displaystyle k}k, produces a hash {\displaystyle M}M. However, since {\displaystyle M}M could be potentially large, the hash result should be mapped to finite entries in the hash table�or slots�several methods can be used to map the keys into the size of hash table {\displaystyle N}N. The most common method is the division method, in which modular arithmetic is used in computing the slot.

{\displaystyle h(k)\ =\ M\ \operatorname {mod} \ N}

In many situations, hash tables turn out to be on average more efficient than search trees or any other table lookup structure. For this reason, they are widely used in many kinds of computer software, particularly for associative arrays, database indexing, caches, and sets.

h(k)=MmodN

This is often done in two steps,

Hash=Hash-Function(key)

Index=Hash%Hash-Table-Size

Choosing a hash function Edit

A basic requirement is that the function should provide a uniform distribution of hash values. A non-uniform distribution increases the number of collisions and the cost of resolving them. Uniformity is sometimes difficult to ensure by design, but may be evaluated empirically using statistical tests, e.g., a Pearson's chi-squared test for discrete uniform distributions.

The distribution needs to be uniform only for table sizes that occur in the application. In particular, if one uses dynamic resizing with exact doubling and halving of the table size, then the hash function needs to be uniform only when the size is a power of two. Here the index can be computed as some range of bits of the hash function. On the other hand, some hashing algorithms prefer to have the size be a prime number. The modulus operation may provide some additional mixing; this is especially useful with a poor hash function.

For open addressing schemes, the hash function should also avoid clustering, the mapping of two or more keys to consecutive slots. Such clustering may cause the lookup cost to skyrocket, even if the load factor is low and collisions are infrequent. The popular multiplicative hash[3] is claimed to have particularly poor clustering behavior.

Cryptographic hash functions are believed to provide good hash functions for any table size, either by modulo reduction or by bit masking. They may also be appropriate if there is a risk of malicious users trying to sabotage a network service by submitting requests designed to generate a large number of collisions in the server's hash tables. However, the risk of sabotage can also be avoided by cheaper methods (such as applying a secret salt to the data). A drawback of cryptographic hashing functions is that they are often slower to compute, which means that in cases where the uniformity for any size is not necessary, a non-cryptographic hashing function might be preferable.[citation needed]

K-independent hashing offers a way to prove a certain hash function doesn't have bad keysets for a given type of hashtable. A number of such results are known for collision resolution schemes such as linear probing and cuckoo hashing.

Posted on by

Score

Views

Comment(s)

Please Login to post your answer or reply to answer

Recent e-Note Comments

Recent e-Notes

Top Scored e-Notes