- It is a method for computing table index from key.
- A good hash function is simple, so it can be computed quickly.
- The major advantage of hash tables is their speed.
- If the hash function is slow, this speed will be degraded.
- The purpose of a hash function is to take a range of key values and transform them into index values in such a way that the key values are distributed randomly across all the indices of the hash table.
- Goal of hash function:The probability of any two keys hashing to the same slot is 1/N.
There are many hash functions approaches as follows:
- Mapping a key K into one of m slots by taking the remainder of K divided by m.
h (K) = K mod m
- Example: Assume a table has 8 slots (m=8). Using division method, insert the following elements into the hash table. 36, 18, 72, 43, and 6 are inserted in the order.
h(k) = K2 and get middle (log10 m) digits
Example: 3121 is a key and square of 3121 is 9740641. Middle part is 406 (with a table size of 1000)
- Shift folding (123456 is folded as 12+34+56)
- Folding at the boundaries (123456 is folded as 12+43+56)
- Example: Addresses from 0 to 99, key = 453 in base 11 = 382. Hash address = 382 mod 99 = 85.
Problems with hashing:
- Collision: No matter what the hash function, there is the possibility that two different keys could resolve to the same hash address. This situation is known as a collision.
- Handling the Collisions: The following techniques can be used to handle the collisions.
- Double hashing (Re-hashing)
- Open Addressing (Linear probing, Quadratic probing, Random probing), etc.
- A chain is simply a linked list of all the elements with the same hash key.
- A linked list is created at each index in the hash table.
- Hash Function: h (K) = K mod m
- Example: Assume a table has 8 slots (m=8). Using the chaining, insert the following elements into the hash table. 36, 18, 72, 43, 6, 10, 5, and 15 are inserted in the order.
- A data item’s key is hashed to the index in simple hashing, and the item is inserted into the linked list at that index.
- Other items that hash to the same index are simply added to the linked list.
- Cost is proportional to length of list.
- Average length = N / M, where N is size of array and M is the number of linked lists.
- Theorem: Let α = N / M > 1 be average length of list. For any t > 1, probability that list length > t α is exponentially small in t.
- Analysis of Chaining:
- If M is too large, then too many empty chains.
- If M is too small, then chains are too long.
- There is no need to search for empty cells in the array or table.
- Advantages of Chaining: Unbounded elements can be stored (No bound to the size of table).
- Disadvantage of Chaining: Too many linked lists (overhead of linked lists)
In open addressing, when a data item can’t be placed at the hashed index value, another location in the array is sought. We’ll explore three methods of open addressing, which vary in the method used to find the next empty location. These methods are linear probing, quadratic probing, and double hashing.
- When using a linear probing method the item will be stored in the next available slot in the table, assuming that the table is not already full.
- This is implemented via a linear searching for an empty slot, from the point of collision.
- If the end of table is reached during the linear search, the search will wrap around to the beginning of the table and continue from there.
- Example: Assume a table has 8 slots (m=8). Using Linear probing, insert the following elements into the hash table. 36, 18, 72, 43, 6, 10, 5, and 15 are inserted in the order.
- Relationship between probe length (P) and load factor (L) for linear probing:
- For a successful search: P = ( 1 + 1 / (1–L)2 ) / 2
- For an unsuccessful search: P = ( 1 + 1 / (1–L) ) / 2
- Analysis of Linear Probing:
- If load factor is too small then too many empty cells.
- Example: Assume a table has 10 slots. Using Quadratic probing, insert the following elements in the given order.
89, 18, 49, 58 and 69 are inserted into a hash table.
- Advantages of Quadratic probing: Compared to linear probing access becomes inefficient at a higher load factor.
- Disadvantages of Quadratic probing: Insertion sometimes fails although the table still has free fields.
- Double hashing requires that the size of the hash table is a prime number.
- Double hashing uses the idea of applying a second hash function to the key when a collision occurs.
- The primary hash function determines the home address. If the home address is occupied, apply a second hash function to get a number c (c relatively prime to N). This c is added to the home address to produce an overflow addresses: if occupied, proceed by adding c to the overflow address, until an empty slot is found.
- Primary hash function is similar to direct hashing. Hash1(key) = Key % table size. A popular secondary hash function is: Hash2(key) = R - ( key % R ) where R is a prime number that is smaller than the size of the table.
- Example: Assume a table has 10 slots. Primary hash function is H1(key) = key mod 10, and secondary hash function is H2(key)=7-(key mod 7). With Double hashing, insert the following elements in the given order.
89, 18, 49, 58 and 69 are inserted into a hash table
- Advantages of Double hashing:
- Compared to linear probing access becomes inefficient at a higher load factor.
- Resolution sequences for different elements are different even if the first hash function hashes the elements to the same field.
- If the hash functions are chosen appropriately, insertion never fails if the table has at least one free field.
So that's all for hashing.
Candidates can also practice 110+ Mock tests for exams like GATE, NIELIT with BYJU'S Exam Prep Test Series check the following link: