Skip to main content

Hashing

Consider a use case of Electronic health record where you have a unique key for every record and you have some data associated with it like persons name, address etc. then how can you handle this data. Some ways to store this information would be to use array, linked list, trees or may be direct access table. But all these also have some problems as below :

In case of array and linked list if you want to search for a record you have to search linearly. We can keeps them sorted so we can apply some searching algorithm and can get faster search results but this will create another problem of inserting  and deleting. In this case every time you insert/delete a record you must also check for the order of the array/linked list.

If we consider a balanced tree then we can get the results for search/insert/delete in moderate amount of time. O(Logn) time.

Out of all these, Direct Access Table shows some promising results. Direct Access Table is nothing but a huge array where the index of the array is the EHR record number. Like this if you know the record you can get the result in O(1) for all the operations but it has an limitation. If you have huge amount of records that needs to be stored then we must allocate this space prior to storing the records. So huge amount of extra space is required in order to get this working.

Due to these limitations we cannot use any of the above solution. Hashing provides a solution to such problems. Hashing requires a hash function which maps the big numbers into a small integer that can be used as a index in the hash table. It is very important to select the hash function wisely because if the hashed values are repeating then most of the values will be placed in this slot and other slot will strive resulting in poor search/insert/delete operations. When such a condition occurs where similar hash value (key) is generated for different data then it is called as "Collision".

There are different techniques present for resolving the collision like :
- Open Addressing
- Chaining

Open Addressing : In this technique, the value is inserted in the next available slot in hashtable. While searching through the table, the hash function will calculate the key and will check the value at that location if it doesn't match then it will starting searching from that position onwards if it reaches the end of the table then it will loop back. If it comes back at the same position then it will return 'value not found' message. This is called as linear hashing.
So different open addressing techniques are
- Linear hashing
- Probe sequence
- Double Hashing

Chaining : In chaining, linked list can be used at every key. So for every repeating key the next value is added in the linked list.

This is called as Resolving Collision.

Analysis
Worst case :
If the every key hashes to same slot. Access takes O(n).

Average Case :
Assumption of Simple Uniform Hashing - Each is equally likely to be hashed to any slot in table independent of where other keys are hashed.

References :

1. https://www.youtube.com/watch?v=JZHBa-rLrBA
2. http://geeksquiz.com/hashing-set-1-introduction/

Comments

Popular posts from this blog

Create Table in Liquibase

For creating table using liquibase you can use below code and add it in your liquibase file. <createTable tableName=“employee”>      <column name="id" type="int">      <constraints primaryKey="true" nullable="false"/>   </column>      <column name="first_name" type="varchar(255)"/>   <column name="last_name" type="varchar(255)"/>   <column name="username" type="varchar(255)">      <constraints unique="true" nullable="false"/>   </column> </createTable> The use is pretty simple it's the way it looks : Tag: <createTable></createTable> This is an opening/ending tag for creating a table. These tags will enclose column sub tags which will define columns for the table. Attribute:   tableName : Name of the table which you want to create. (This is a mandatory   ...

How databasechangeloglock and databasechangelog table used by liquibase?

Liquibase takes care of executing the query on the database while maintaining the list of queries executed and also maintaining the locks over the tables simultaneously. The two tables that are used by the liquibase for this purpose are : Databasechangeloglock : This table have following columns ID | LOCKED| LOCKGRANTED | LOCKEDBY. This maintains the locks information granted to the user. The primary purpose of this table is to make sure that two machines don't attempt to modify the data at the same time. Databasechangelog : This table have following columns ID | AUTHOR | FILENAME | DATEEXECUTED | ORDEREXECUTED | EXECTYPE | MD5SUM | DESCRIPTION | COMMENTS | TAG | LIQUIBASE This table maintains the list of the statements that are executed on the database.

Carnivorous Island from "Life of PI"

Yann Martel described about an floating carnivorous Island for symbolizing that something seemingly good in life can turn out to be harmful, so we should keep going with our journey of life... About the carnivorous Island from movie. In the movie 'Life of PI' after PI struggles to live in the pacific with the Bengal tiger suddenly one morning Pi  found himself at a Island. After spending so many days in the middle of the sea he found himself very lucky to find a Island. After spending some time he thought might be this was the end of his suffering and he can spend rest of his life at that island. He makes h is bed on a tree and sleeps there, in the middle of the night he wakes up and find that there were dead fish in the fresh water where he took swim in the day time. While trying to understand the mystery he plucks a fruit from the same tree and peel it off and finds a human tooth inside that fruit and then he realize that it was an carnivorous Island. The...