Hashing Study Notes

By Mukesh Kumar|Updated : December 12th, 2021

Hashing is a common method of accessing data records using the hash table. Hashing can be used to build, search, or delete from a table.

Hash Table: A hash table is a data structure that stores records in an array, called a hash table. A Hash table can be used for quick insertion and searching.

Load Factor: The ratio of the number of items in a table to the table’s size is called the load factor.

Hash Function:

• It is a method for computing table index from key.
• A good hash function is simple, so it can be computed quickly.
• The major advantage of hash tables is their speed.
• If the hash function is slow, this speed will be degraded.
• The purpose of a hash function is to take a range of key values and transform them into index values in such a way that the key values are distributed randomly across all the indices of the hash table.
• Goal of hash function:The probability of any two keys hashing to the same slot is 1/N.

There are many hash functions approaches as follows:

Division Method:

• Mapping a key K into one of m slots by taking the remainder of K divided by m.

h (K) = K mod m

• Example: Assume a table has 8 slots (m=8). Using division method, insert the following elements into the hash table. 36, 18, 72, 43, and 6 are inserted in the order.

Mid-Square Method:

Mapping a key K into one of m slots, by getting the some middle digits from value K2.

h(k) = K2 and get middle (log10 m) digits

Example: 3121 is a key and square of 3121 is 9740641. Middle part is 406 (with a table size of 1000)

Folding Method:

Divide the key K into some sections, besides the last section, have same length. Then, add these sections together.
• Shift folding (123456 is folded as 12+34+56)
• Folding at the boundaries (123456 is folded as 12+43+56)

Transform the number into another base and then divide by the maximum address.
• Example: Addresses from 0 to 99,  key = 453 in base 11 = 382. Hash address = 382 mod 99 = 85.

Problems with hashing:

• Collision: No matter what the hash function, there is the possibility that two different keys could resolve to the same hash address. This situation is known as a collision.
• Handling the Collisions: The following techniques can be used to handle the collisions.
• Chaining
• Double hashing (Re-hashing)

Chaining:

• A chain is simply a linked list of all the elements with the same hash key.
• A linked list is created at each index in the hash table.
• Hash Function: h (K) = K mod m
• Example: Assume a table has 8 slots (m=8). Using the chaining, insert the following elements into the hash table. 36, 18, 72, 43, 6, 10, 5, and 15 are inserted in the order.

• A data item’s key is hashed to the index in simple hashing, and the item is inserted into the linked list at that index.
• Other items that hash to the same index are simply added to the linked list.
• Cost is proportional to length of list.
• Average length = N / M, where N is size of array and M is the number of linked lists.
• Theorem: Let α = N / M > 1 be average length of list. For any t > 1, probability that list length > t α is exponentially small in t.
• Analysis of Chaining:
• If M is too large, then too many empty chains.
• If M is too small, then chains are too long.
• There is no need to search for empty cells in the array or table.
• Advantages of Chaining: Unbounded elements can be stored (No bound to the size of table).

In open addressing, when a data item can’t be placed at the hashed index value, another location in the array is sought. We’ll explore three methods of open addressing, which vary in the method used to find the next empty location. These methods are linear probing, quadratic probing, and double hashing.

Linear Probing:

• When using a linear probing method the item will be stored in the next available slot in the table, assuming that the table is not already full.
• This is implemented via a linear searching for an empty slot, from the point of collision.
• If the end of table is reached during the linear search, the search will wrap around to the beginning of the table and continue from there.
• Example: Assume a table has 8 slots (m=8). Using Linear probing, insert the following elements into the hash table. 36, 18, 72, 43, 6, 10, 5, and 15 are inserted in the order.

• Relationship between probe length (P) and load factor (L) for linear probing:
• For a successful search: P = ( 1 + 1 / (1–L)2 ) / 2
• For an unsuccessful search: P = ( 1 + 1 / (1–L) ) / 2
• Analysis of Linear Probing:
• If load factor is too small then too many empty cells.

If the collision occurs, instead of moving one cell, move i2 cells from the point of collision, where i is the number of attempts to resolve the collision. Example:
• Example: Assume a table has 10 slots. Using Quadratic probing, insert the following elements in the given order.

89, 18, 49, 58 and 69 are inserted into a hash table.

\

• Advantages of Quadratic probing: Compared to linear probing access becomes inefficient at a higher load factor.
• Disadvantages of Quadratic probing: Insertion sometimes fails although the table still has free fields.

Double Hashing:

• Double hashing requires that the size of the hash table is a prime number.
• Double hashing uses the idea of applying a second hash function to the key when a collision occurs.
• The primary hash function determines the home address. If the home address is occupied, apply a second hash function to get a number c (c relatively prime to N). This c is added to the home address to produce an overflow addresses: if occupied, proceed by adding c to the overflow address, until an empty slot is found.
• Primary hash function is similar to direct hashing.  Hash1(key) = Key % table size. A popular secondary hash function is: Hash2(key) = R - ( key % R ) where R is a prime number that is smaller than the size of the table.
• Example: Assume a table has 10 slots.  Primary hash function is H1(key) = key mod 10, and secondary hash function is H2(key)=7-(key mod 7). With Double hashing, insert the following elements in the given order.

89, 18, 49, 58 and 69 are inserted into a hash table

• Compared to linear probing access becomes inefficient at a higher load factor.
• Resolution sequences for different elements are different even if the first hash function hashes the elements to the same field.
• If the hash functions are chosen appropriately, insertion never fails if the table has at least one free field.

So that's all for hashing.

Candidates can also practice 110+ Mock tests for exams like GATE, NIELIT with BYJU'S Exam Prep Test Series check the following link:

Thanks

#DreamStriveSucceed

Posted by:

Member since Feb 2020

write a comment

Simern BhullarMay 9, 2017

Explain double hashing

Tania AroraJun 23, 2017

Explain rehashing too

Sofia SunamMar 24, 2018

bandameedi lahariJul 22, 2018

Can anyone explain in quadratic probing 58 given as 3 attempts, can you explain how to know that ?

Akash ShuklaJul 22, 2018

Varshitha BOct 28, 2018

If key value is 6 then how to do mid square method

Rakesh NamaAug 18, 2019

This comment is hidden because it was marked spam.

Pavan Kumar TAug 18, 2019

Please explain example of double hashing

PriyanshuAug 12, 2020

Anyone having algorithms video lectures link??

Raj MajhiDec 30, 2021

Five  call

Related Posts

GradeStack Learning Pvt. Ltd.Windsor IT Park, Tower - A, 2nd Floor, Sector 125, Noida, Uttar Pradesh 201303 help@byjusexamprep.com