Skip to main content

Hashing

Consider a use case of Electronic health record where you have a unique key for every record and you have some data associated with it like persons name, address etc. then how can you handle this data. Some ways to store this information would be to use array, linked list, trees or may be direct access table. But all these also have some problems as below :

In case of array and linked list if you want to search for a record you have to search linearly. We can keeps them sorted so we can apply some searching algorithm and can get faster search results but this will create another problem of inserting  and deleting. In this case every time you insert/delete a record you must also check for the order of the array/linked list.

If we consider a balanced tree then we can get the results for search/insert/delete in moderate amount of time. O(Logn) time.

Out of all these, Direct Access Table shows some promising results. Direct Access Table is nothing but a huge array where the index of the array is the EHR record number. Like this if you know the record you can get the result in O(1) for all the operations but it has an limitation. If you have huge amount of records that needs to be stored then we must allocate this space prior to storing the records. So huge amount of extra space is required in order to get this working.

Due to these limitations we cannot use any of the above solution. Hashing provides a solution to such problems. Hashing requires a hash function which maps the big numbers into a small integer that can be used as a index in the hash table. It is very important to select the hash function wisely because if the hashed values are repeating then most of the values will be placed in this slot and other slot will strive resulting in poor search/insert/delete operations. When such a condition occurs where similar hash value (key) is generated for different data then it is called as "Collision".

There are different techniques present for resolving the collision like :
- Open Addressing
- Chaining

Open Addressing : In this technique, the value is inserted in the next available slot in hashtable. While searching through the table, the hash function will calculate the key and will check the value at that location if it doesn't match then it will starting searching from that position onwards if it reaches the end of the table then it will loop back. If it comes back at the same position then it will return 'value not found' message. This is called as linear hashing.
So different open addressing techniques are
- Linear hashing
- Probe sequence
- Double Hashing

Chaining : In chaining, linked list can be used at every key. So for every repeating key the next value is added in the linked list.

This is called as Resolving Collision.

Analysis
Worst case :
If the every key hashes to same slot. Access takes O(n).

Average Case :
Assumption of Simple Uniform Hashing - Each is equally likely to be hashed to any slot in table independent of where other keys are hashed.

References :

1. https://www.youtube.com/watch?v=JZHBa-rLrBA
2. http://geeksquiz.com/hashing-set-1-introduction/

Comments

Popular posts from this blog

Carnivorous Island from "Life of PI"

Yann Martel described about an floating carnivorous Island for symbolizing that something seemingly good in life can turn out to be harmful, so we should keep going with our journey of life... About the carnivorous Island from movie. In the movie 'Life of PI' after PI struggles to live in the pacific with the Bengal tiger suddenly one morning Pi  found himself at a Island. After spending so many days in the middle of the sea he found himself very lucky to find a Island. After spending some time he thought might be this was the end of his suffering and he can spend rest of his life at that island. He makes h is bed on a tree and sleeps there, in the middle of the night he wakes up and find that there were dead fish in the fresh water where he took swim in the day time. While trying to understand the mystery he plucks a fruit from the same tree and peel it off and finds a human tooth inside that fruit and then he realize that it was an carnivorous Island. The...

Create Table in Liquibase

For creating table using liquibase you can use below code and add it in your liquibase file. <createTable tableName=“employee”>      <column name="id" type="int">      <constraints primaryKey="true" nullable="false"/>   </column>      <column name="first_name" type="varchar(255)"/>   <column name="last_name" type="varchar(255)"/>   <column name="username" type="varchar(255)">      <constraints unique="true" nullable="false"/>   </column> </createTable> The use is pretty simple it's the way it looks : Tag: <createTable></createTable> This is an opening/ending tag for creating a table. These tags will enclose column sub tags which will define columns for the table. Attribute:   tableName : Name of the table which you want to create. (This is a mandatory   ...

Everything you need to know about ShellShock.

Shellshock was a security threat that was identified by Stéphane Chazelas on 12 September 2014 and was disclosed and announced to the public on 24 th September 2014 with the fix ready for distribution. It was assigned the identifier as CVE-2014-6271 . By the way, this is another security bug with cool logo after Heartbleed. 1. What is Shellshock? Shellshock is a name given to a vulnerability in Bash which allows an attacker to execute remote commands on vulnerable system. Bash is a command-line interpreter available on most of the operating systems like Apple’s OSx, windows and present on many versions of Linux. Bash also acts as a parser for the functions given to it.  2. Why is it harmful? Shellshock is significantly harmful for the servers connected to internet. Since, it lets attacker to escalate their privileges and potentially they can gain access to root. This gives access to the attacker as if they are the actual user and they can perform any...