Hashing

Consider a use case of Electronic health record where you have a unique key for every record and you have some data associated with it like persons name, address etc. then how can you handle this data. Some ways to store this information would be to use array, linked list, trees or may be direct access table. But all these also have some problems as below :

In case of array and linked list if you want to search for a record you have to search linearly. We can keeps them sorted so we can apply some searching algorithm and can get faster search results but this will create another problem of inserting and deleting. In this case every time you insert/delete a record you must also check for the order of the array/linked list.

If we consider a balanced tree then we can get the results for search/insert/delete in moderate amount of time. O(Logn) time.

Out of all these, Direct Access Table shows some promising results. Direct Access Table is nothing but a huge array where the index of the array is the EHR record number. Like this if you know the record you can get the result in O(1) for all the operations but it has an limitation. If you have huge amount of records that needs to be stored then we must allocate this space prior to storing the records. So huge amount of extra space is required in order to get this working.

Due to these limitations we cannot use any of the above solution. Hashing provides a solution to such problems. Hashing requires a hash function which maps the big numbers into a small integer that can be used as a index in the hash table. It is very important to select the hash function wisely because if the hashed values are repeating then most of the values will be placed in this slot and other slot will strive resulting in poor search/insert/delete operations. When such a condition occurs where similar hash value (key) is generated for different data then it is called as "Collision".

There are different techniques present for resolving the collision like :
- Open Addressing
- Chaining

Open Addressing : In this technique, the value is inserted in the next available slot in hashtable. While searching through the table, the hash function will calculate the key and will check the value at that location if it doesn't match then it will starting searching from that position onwards if it reaches the end of the table then it will loop back. If it comes back at the same position then it will return 'value not found' message. This is called as linear hashing.
So different open addressing techniques are
- Linear hashing
- Probe sequence
- Double Hashing

Chaining : In chaining, linked list can be used at every key. So for every repeating key the next value is added in the linked list.

This is called as Resolving Collision.

Analysis
Worst case :
If the every key hashes to same slot. Access takes O(n).

Average Case :
Assumption of Simple Uniform Hashing - Each is equally likely to be hashed to any slot in table independent of where other keys are hashed.

References :

1. https://www.youtube.com/watch?v=JZHBa-rLrBA
2. http://geeksquiz.com/hashing-set-1-introduction/

Dynamic Views

Search This Blog

Hashing

Labels

Comments

Post a Comment

Popular posts from this blog

Carnivorous Island from "Life of PI"

liquibase.exception.LockException: Could not acquire change log lock. Currently locked by...

Create Table in Liquibase