Skip to main content

Hashing

Consider a use case of Electronic health record where you have a unique key for every record and you have some data associated with it like persons name, address etc. then how can you handle this data. Some ways to store this information would be to use array, linked list, trees or may be direct access table. But all these also have some problems as below :

In case of array and linked list if you want to search for a record you have to search linearly. We can keeps them sorted so we can apply some searching algorithm and can get faster search results but this will create another problem of inserting  and deleting. In this case every time you insert/delete a record you must also check for the order of the array/linked list.

If we consider a balanced tree then we can get the results for search/insert/delete in moderate amount of time. O(Logn) time.

Out of all these, Direct Access Table shows some promising results. Direct Access Table is nothing but a huge array where the index of the array is the EHR record number. Like this if you know the record you can get the result in O(1) for all the operations but it has an limitation. If you have huge amount of records that needs to be stored then we must allocate this space prior to storing the records. So huge amount of extra space is required in order to get this working.

Due to these limitations we cannot use any of the above solution. Hashing provides a solution to such problems. Hashing requires a hash function which maps the big numbers into a small integer that can be used as a index in the hash table. It is very important to select the hash function wisely because if the hashed values are repeating then most of the values will be placed in this slot and other slot will strive resulting in poor search/insert/delete operations. When such a condition occurs where similar hash value (key) is generated for different data then it is called as "Collision".

There are different techniques present for resolving the collision like :
- Open Addressing
- Chaining

Open Addressing : In this technique, the value is inserted in the next available slot in hashtable. While searching through the table, the hash function will calculate the key and will check the value at that location if it doesn't match then it will starting searching from that position onwards if it reaches the end of the table then it will loop back. If it comes back at the same position then it will return 'value not found' message. This is called as linear hashing.
So different open addressing techniques are
- Linear hashing
- Probe sequence
- Double Hashing

Chaining : In chaining, linked list can be used at every key. So for every repeating key the next value is added in the linked list.

This is called as Resolving Collision.

Analysis
Worst case :
If the every key hashes to same slot. Access takes O(n).

Average Case :
Assumption of Simple Uniform Hashing - Each is equally likely to be hashed to any slot in table independent of where other keys are hashed.

References :

1. https://www.youtube.com/watch?v=JZHBa-rLrBA
2. http://geeksquiz.com/hashing-set-1-introduction/

Comments

Popular posts from this blog

Create Table in Liquibase

For creating table using liquibase you can use below code and add it in your liquibase file. <createTable tableName=“employee”>      <column name="id" type="int">      <constraints primaryKey="true" nullable="false"/>   </column>      <column name="first_name" type="varchar(255)"/>   <column name="last_name" type="varchar(255)"/>   <column name="username" type="varchar(255)">      <constraints unique="true" nullable="false"/>   </column> </createTable> The use is pretty simple it's the way it looks : Tag: <createTable></createTable> This is an opening/ending tag for creating a table. These tags will enclose column sub tags which will define columns for the table. Attribute:   tableName : Name of the table which you want to create. (This is a mandatory  

ReferenceError: dhtmlXGrid is not defined : Resolved

For the errors like : ReferenceError: dhtmlXDataView is not defined ReferenceError: dhtmlXGrid is not defined ReferenceError: dhtmlXTree is not defined ReferenceError: dhtmlXTreeGrid is not defined etc I have been working on dhtmlx for long time now and this is one of the most basic exception faced by the developer and which in fact is very easy to resolve. Reasons: 1. You are actually referring to a wrong location of the JS file.    - This is a very common mistake done and most of the time we are so damn sure that we don't even care about checking the path once. Even though you have copied it from your existing project where it is working, if you face this issue don't forget to check the path once, it won't harm you. 2. Check for relative path.    - So now you have copied it and paste it in your new file where you are going to use the dhtmlx component and when you open your file you get this error and it becomes frustrating knowing that same thing is wo

How databasechangeloglock and databasechangelog table used by liquibase?

Liquibase takes care of executing the query on the database while maintaining the list of queries executed and also maintaining the locks over the tables simultaneously. The two tables that are used by the liquibase for this purpose are : Databasechangeloglock : This table have following columns ID | LOCKED| LOCKGRANTED | LOCKEDBY. This maintains the locks information granted to the user. The primary purpose of this table is to make sure that two machines don't attempt to modify the data at the same time. Databasechangelog : This table have following columns ID | AUTHOR | FILENAME | DATEEXECUTED | ORDEREXECUTED | EXECTYPE | MD5SUM | DESCRIPTION | COMMENTS | TAG | LIQUIBASE This table maintains the list of the statements that are executed on the database.