Hashing Study Notes

By BYJU'S Exam Prep

Updated on: September 25th, 2023

Hashing is a common method of accessing data records using the hash table. Hashing can be used to build, search, or delete from a table.

Hash Table: A hash table is a data structure that stores records in an array, called a hash table. A Hash table can be used for quick insertion and searching.

Load Factor: The ratio of the number of items in a table to the table’s size is called the load factor.

Table of content

1. Click Here to Avail GATE CSE Test Series!

Hash Function:

It is a method for computing table index from key.
A good hash function is simple, so it can be computed quickly.
The major advantage of hash tables is their speed.
If the hash function is slow, this speed will be degraded.
The purpose of a hash function is to take a range of key values and transform them into index values in such a way that the key values are distributed randomly across all the indices of the hash table.
Goal of hash function:The probability of any two keys hashing to the same slot is 1/N.

There are many hash functions approaches as follows:

Division Method:

Mapping a key K into one of m slots by taking the remainder of K divided by m.

h (K) = K mod m

Example: Assume a table has 8 slots (m=8). Using division method, insert the following elements into the hash table. 36, 18, 72, 43, and 6 are inserted in the order.

hashing-image1

Mid-Square Method:

Mapping a key K into one of m slots, by getting the some middle digits from value K².

h(k) = K² and get middle (log₁₀ m) digits

Example: 3121 is a key and square of 3121 is 9740641. Middle part is 406 (with a table size of 1000)

Folding Method:

Divide the key K into some sections, besides the last section, have same length. Then, add these sections together.

Shift folding (123456 is folded as 12+34+56)
Folding at the boundaries (123456 is folded as 12+43+56)

Radix Hashing:

Transform the number into another base and then divide by the maximum address.

Example: Addresses from 0 to 99, key = 453 in base 11 = 382. Hash address = 382 mod 99 = 85.

Problems with hashing:

Collision: No matter what the hash function, there is the possibility that two different keys could resolve to the same hash address. This situation is known as a collision.
Handling the Collisions: The following techniques can be used to handle the collisions.
- Chaining
- Double hashing (Re-hashing)
- Open Addressing (Linear probing, Quadratic probing, Random probing), etc.

Chaining:

A chain is simply a linked list of all the elements with the same hash key.
A linked list is created at each index in the hash table.
Hash Function: h (K) = K mod m
Example: Assume a table has 8 slots (m=8). Using the chaining, insert the following elements into the hash table. 36, 18, 72, 43, 6, 10, 5, and 15 are inserted in the order.

hashing-image2

hashing-image3

A data item’s key is hashed to the index in simple hashing, and the item is inserted into the linked list at that index.
Other items that hash to the same index are simply added to the linked list.
Cost is proportional to length of list.
Average length = N / M, where N is size of array and M is the number of linked lists.
Theorem: Let α = N / M > 1 be average length of list. For any t > 1, probability that list length > t α is exponentially small in t.
Analysis of Chaining:
- If M is too large, then too many empty chains.
- If M is too small, then chains are too long.
There is no need to search for empty cells in the array or table.

Advantages of Chaining: Unbounded elements can be stored (No bound to the size of table).
Disadvantage of Chaining: Too many linked lists (overhead of linked lists)

Open Addressing:

In open addressing, when a data item can’t be placed at the hashed index value, another location in the array is sought. We’ll explore three methods of open addressing, which vary in the method used to find the next empty location. These methods are linear probing, quadratic probing, and double hashing.

Linear Probing:

When using a linear probing method the item will be stored in the next available slot in the table, assuming that the table is not already full.
This is implemented via a linear searching for an empty slot, from the point of collision.
If the end of table is reached during the linear search, the search will wrap around to the beginning of the table and continue from there.
Example: Assume a table has 8 slots (m=8). Using Linear probing, insert the following elements into the hash table. 36, 18, 72, 43, 6, 10, 5, and 15 are inserted in the order.

hashing-image2

Linear

Relationship between probe length (P) and load factor (L) for linear probing:
- For a successful search: P = ( 1 + 1 / (1–L)² ) / 2
- For an unsuccessful search: P = ( 1 + 1 / (1–L) ) / 2
Analysis of Linear Probing:
- If load factor is too small then too many empty cells.

Quadratic Probing:

If the collision occurs, instead of moving one cell, move i² cells from the point of collision, where i is the number of attempts to resolve the collision. Example:

Example: Assume a table has 10 slots. Using Quadratic probing, insert the following elements in the given order.

89, 18, 49, 58 and 69 are inserted into a hash table.

Quadratic-1

\ Quadratic-2

Advantages of Quadratic probing: Compared to linear probing access becomes inefficient at a higher load factor.

Disadvantages of Quadratic probing: Insertion sometimes fails although the table still has free fields.

Double Hashing:

Double hashing requires that the size of the hash table is a prime number.
Double hashing uses the idea of applying a second hash function to the key when a collision occurs.
The primary hash function determines the home address. If the home address is occupied, apply a second hash function to get a number c (c relatively prime to N). This c is added to the home address to produce an overflow addresses: if occupied, proceed by adding c to the overflow address, until an empty slot is found.
Primary hash function is similar to direct hashing. Hash1(key) = Key % table size. A popular secondary hash function is: Hash₂(key) = R – ( key % R ) where R is a prime number that is smaller than the size of the table.

Example: Assume a table has 10 slots. Primary hash function is H1(key) = key mod 10, and secondary hash function is H2(key)=7-(key mod 7). With Double hashing, insert the following elements in the given order.

89, 18, 49, 58 and 69 are inserted into a hash table

Double

Advantages of Double hashing:
- Compared to linear probing access becomes inefficient at a higher load factor.
- Resolution sequences for different elements are different even if the first hash function hashes the elements to the same field.
- If the hash functions are chosen appropriately, insertion never fails if the table has at least one free field.

So that’s all for hashing.

Candidates can also practice 110+ Mock tests for exams like GATE, NIELIT with BYJU’S Exam Prep Test Series check the following link:

Click Here to Avail GATE CSE Test Series!

Thanks

#DreamStriveSucceed