Top community members
All Wiki Articles Create Wiki Article

In the past we had a dream,
to have place where we could share IT knowledge,
to ask questions without fear that someone will judge us.

Now we are a group of people who make this dream come true. ❤ 💻

If you think sharing knowledge and helping other is valuable.

join our community - Click here

Java - custom implementation of hash map (easy to understand)

0 contributions
1 points

In this post we can see how to implement Hash Map. It is one of the most used data structure in programming. It also depends on the programming language how do we call this data structure, very common synonyms are: hash map, hash table, dictionary, key value store, associative array. Hash Map can be implemented in many different ways but the main functionality it should provide is very similar.


Hash Map time complexity in big O notation:

AlgorithmAverage Worst case
SpaceO(n)O(n)
SearchO(1)O(n)
InsertO(1)O(n)
DeleteO(1)O(n)

Simple hash map implementation

package com.dirask;

import java.util.Arrays;

public class SimpleHashMap {
    private int size;
    private int tableSize;
    private SimpleLinkedList<Entry>[] table;

    public SimpleHashMap(int tabSize) {
        tableSize = tabSize;
        size = 0;
        table = new SimpleLinkedList[tableSize];
        for (int i = 0; i < tableSize; i++) {
            table[i] = new SimpleLinkedList<>();
        }
    }

    public void put(Integer key, String value) {
        String old = get(key);
        if (old != null) {
            remove(key);
        }
        int hash = getHashCode(key);
        Entry entry = new Entry(key, value, hash);
        table[hash].addLast(entry);
        size++;
    }

    public String get(Integer key) {
        int hashCode = getHashCode(key);
        SimpleLinkedList<Entry> list = table[hashCode];
        if (list.size > 0) {
            ListNode<Entry> curr = list.first;
            while (curr != null) {
                if (curr.item.key.equals(key)) {
                    return curr.item.value;
                }
                curr = curr.next;
            }
        }
        return null;
    }

    // logic similar to get method
    public String remove(Integer key) {
        int hashCode = getHashCode(key);
        SimpleLinkedList<Entry> list = table[hashCode];

        if (list.size > 0) {
            ListNode<Entry> curr = list.first;
            while (curr != null) {
                if (curr.item.key.equals(key)) {
                    String value = curr.item.value;
                    list.unlink(curr);
                    size--;
                    return value;
                }
                curr = curr.next;
            }
        }
        return null;
    }

    public boolean containsKey(Integer key) {
        return get(key) != null;
    }

    public int size() {
        return size;
    }

    private int getHashCode(Integer key) {
        int hashCode = key.hashCode();
        if (hashCode < 0) {
            hashCode *= -1;
        }
        return hashCode % tableSize;
    }

    private class Entry {
        Integer key;
        String value;
        int hashcode;

        public Entry(Integer key, String value, int hashcode) {
            this.key = key;
            this.value = value;
            this.hashcode = hashcode;
        }

        @Override
        public String toString() {
            return "{" +
                    "key=" + key +
                    ", hashcode=" + hashcode +
                    ", value=" + value +
                    '}';
        }
    }

    @Override
    public String toString() {
        int[] distribution = new int[table.length];

        for (int i = 0; i < table.length; i++) {
            System.out.print("level " + i + ": ");

            SimpleLinkedList<Entry> list = table[i];
            System.out.println(list.toString());
            distribution[i] = list.size;
        }

        System.out.println("distribution: " + Arrays.toString(distribution));

        return "SimpleHashMapExample{" +
                "size=" + size +
                ", tableSize=" + tableSize +
                '}';
    }


    public void printBucketDistribution() {
        int[] distribution = new int[table.length];

        for (int i = 0; i < table.length; i++) {
            SimpleLinkedList<Entry> list = table[i];
            distribution[i] = list.size;
        }

        String msg = "Bucket distribution: ";
        System.out.println(msg);
        System.out.println(Arrays.toString(distribution));
    }
}

class ListNode<E> {
    ListNode<E> prev;
    E item;
    ListNode<E> next;

    public ListNode(ListNode<E> prev, E item, ListNode<E> next) {
        this.prev = prev;
        this.item = item;
        this.next = next;
    }

    public String printForward() {
        if (next != null) {
            String call = next.printForward();
            String result = item + " -> " + call;
            return result;
        } else {
            return String.valueOf(item);
        }
    }
}

class SimpleLinkedList<E> {
    ListNode<E> first;
    ListNode<E> last;
    int size;

    public SimpleLinkedList() {

    }

    public int size() {
        return size;
    }

    public ListNode<E> addLast(E e) {
        ListNode<E> tmp = last;
        ListNode<E> newNode = new ListNode<>(tmp, e, null);
        last = newNode;
        if (tmp == null) {
            first = newNode;
        } else {
            tmp.next = newNode;
        }
        size++;
        return newNode;
    }

    public E unlink(ListNode<E> unlink) {
        E item = unlink.item;
        ListNode<E> prev = unlink.prev;
        ListNode<E> next = unlink.next;

        if (prev == null) {
            first = next;
        } else {
            prev.next = next;
            unlink.prev = null;
        }

        if (next == null) {
            last = prev;
        } else {
            next.prev = prev;
            unlink.next = null;
        }

        unlink.item = null;
        size--;
        return item;
    }

    @Override
    public String toString() {
        if (first != null) {
            return "" + first.printForward();
        }
        return "empty";
    }
}

Test 1 - testing linked list used as backbone of our hash map

package com.dirask;

import org.assertj.core.api.Assertions;
import org.junit.Test;

public class SimpleHashMapTests1 {

    @Test
    public void runLinkedList() {
        SimpleLinkedList<String> list = new SimpleLinkedList<>();
        ListNode<String> node1 = list.addLast("a");
        ListNode<String> node2 = list.addLast("b");
        ListNode<String> node3 = list.addLast("c");

        System.out.println(list.toString());
        Assertions.assertThat(list.toString().equals("a -> b -> c")).isTrue();

        list.unlink(node2);
        System.out.println(list.toString());
        Assertions.assertThat(list.toString().equals("a -> c")).isTrue();

        ListNode<String> node4 = list.addLast("d");
        System.out.println(list.toString());
        Assertions.assertThat(list.toString().equals("a -> c -> d")).isTrue();

        list.unlink(node3);
        System.out.println(list.toString());
        Assertions.assertThat(list.toString().equals("a -> d")).isTrue();

        list.unlink(node1);
        System.out.println(list.toString());
        Assertions.assertThat(list.toString().equals("d")).isTrue();

        list.unlink(node4);
        System.out.println(list.toString());
        Assertions.assertThat(list.toString().equals("empty")).isTrue();
    }
}

Output:

a -> b -> c
a -> c
a -> c -> d
a -> d
d
empty

Test 2 - put, get, remove, containsKey methods

package com.dirask;

import org.assertj.core.api.Assertions;
import org.junit.Test;

public class SimpleHashMapTests2 {

    @Test
    public void simpleHashMapTest() {
        SimpleHashMap map = new SimpleHashMap(3);

        map.put(1, "a");
        map.put(2, "b");
        map.put(3, "c");
        map.put(4, "d");
        map.put(5, "e");

        System.out.println(map.toString());
        Assertions.assertThat(map.size() == 5).isTrue();

        Assertions.assertThat(map.get(3).equals("c")).isTrue();
        Assertions.assertThat(map.get(1).equals("a")).isTrue();

        Assertions.assertThat(map.remove(3).equals("c")).isTrue();
        Assertions.assertThat(map.size() == 4).isTrue();

        Assertions.assertThat(map.remove(5).equals("e")).isTrue();
        Assertions.assertThat(map.size() == 3).isTrue();

        Assertions.assertThat(map.containsKey(1)).isTrue();
        Assertions.assertThat(map.containsKey(2)).isTrue();
        Assertions.assertThat(map.containsKey(4)).isTrue();

        System.out.println(map.toString());
    }
}

Output:

level 0: {key=3, hashcode=0, value=c}
level 1: {key=1, hashcode=1, value=a} -> {key=4, hashcode=1, value=d}
level 2: {key=2, hashcode=2, value=b} -> {key=5, hashcode=2, value=e}
distribution: [1, 2, 2]
SimpleHashMapExample{size=5, tableSize=3}
level 0: empty
level 1: {key=1, hashcode=1, value=a} -> {key=4, hashcode=1, value=d}
level 2: {key=2, hashcode=2, value=b}
distribution: [0, 2, 1]
SimpleHashMapExample{size=3, tableSize=3}

Test 3 - put, get, remove, containsKey methods - more data

In this test we use all small letters of alphabet from a to z (26x) as keys in our hash map.

package com.dirask;

import org.assertj.core.api.Assertions;
import org.junit.Test;

public class SimpleHashMapTests3 {

    @Test
    public void simpleHashMapTest() {
        SimpleHashMap map = new SimpleHashMap(10);

        // put
        for (int i = 0; i < 26; i++) {
            int key = i + 100;
            // ascii: 65 - A, 66 - B, ...
            String asciiValue = "" + (char) ('A' + i);
            map.put(key, asciiValue);
        }

        Assertions.assertThat(map.size()).isEqualTo(26);
        System.out.println(map.toString());

        // get, contains
        for (int i = 0; i < 26; i++) {
            int key = i + 100;
            String asciiValue = "" + (char) ('A' + i);
            String value = map.get(key);

            Assertions.assertThat(value.equals(asciiValue)).isTrue();

            boolean containsKey = map.containsKey(key);
            Assertions.assertThat(containsKey).isTrue();
        }

        // remove
        for (int i = 0; i < 26; i++) {
            int key = i + 100;
            String asciiValue = "" + (char) ('A' + i);

            String value = map.remove(key);
            Assertions.assertThat(value.equals(asciiValue)).isTrue();

            int expectedSize = 26 - i - 1;
            Assertions.assertThat(map.size()).isEqualTo(expectedSize);
        }

        Assertions.assertThat(map.size()).isEqualTo(0);
    }
}

Output:

level 0: {key=100, hashcode=0, value=A} -> {key=110, hashcode=0, value=K} -> {key=120, hashcode=0, value=U}
level 1: {key=101, hashcode=1, value=B} -> {key=111, hashcode=1, value=L} -> {key=121, hashcode=1, value=V}
level 2: {key=102, hashcode=2, value=C} -> {key=112, hashcode=2, value=M} -> {key=122, hashcode=2, value=W}
level 3: {key=103, hashcode=3, value=D} -> {key=113, hashcode=3, value=N} -> {key=123, hashcode=3, value=X}
level 4: {key=104, hashcode=4, value=E} -> {key=114, hashcode=4, value=O} -> {key=124, hashcode=4, value=Y}
level 5: {key=105, hashcode=5, value=F} -> {key=115, hashcode=5, value=P} -> {key=125, hashcode=5, value=Z}
level 6: {key=106, hashcode=6, value=G} -> {key=116, hashcode=6, value=Q}
level 7: {key=107, hashcode=7, value=H} -> {key=117, hashcode=7, value=R}
level 8: {key=108, hashcode=8, value=I} -> {key=118, hashcode=8, value=S}
level 9: {key=109, hashcode=9, value=J} -> {key=119, hashcode=9, value=T}
distribution: [3, 3, 3, 3, 3, 3, 2, 2, 2, 2]
SimpleHashMapExample{size=26, tableSize=10}

Test 4 - 2600 inserts to hash map of size 10

package com.dirask;

import org.assertj.core.api.Assertions;
import org.junit.Test;

import java.util.HashSet;
import java.util.Set;
import java.util.concurrent.ThreadLocalRandom;

public class SimpleHashMapTests4 {

    @Test
    public void simpleHashMapTest() {
        SimpleHashMap map = new SimpleHashMap(10);

        Set<Integer> inserts = new HashSet<>();

        for (int i = 0; i < 2600; i++) {
            int randomInt = ThreadLocalRandom.current().nextInt(100, 20_000);
            map.put(randomInt, randomString(3));

            inserts.add(randomInt);
        }

        System.out.println("size: " + map.size());
        System.out.println((2600 - map.size()) + " keys were duplicated " +
                "and we override the value for this key");

        for (Integer key : inserts) {
            boolean containsKey = map.containsKey(key);
            Assertions.assertThat(containsKey).isTrue();
        }

        map.printBucketDistribution();
    }

    private static String randomString(int len) {
        String alphabet =
                "ABCDEFGHIJKLMNOPQRSTUVWXYZ" +
                "abcdefghijklmnopqrstuvwxyz" +
                "0123456789";

        StringBuilder builder = new StringBuilder();

        for (int i = 0; i < len; i++) {
            int randIdx = ThreadLocalRandom.current().nextInt(alphabet.length());
            char randChar = alphabet.charAt(randIdx);
            builder.append(randChar);
        }

        return builder.toString();
    }
}

Output:

size: 2459
141 keys were duplicated and we override the value for this key
Bucket distribution:
[239, 250, 246, 257, 243, 253, 242, 223, 270, 236]

Expalanation

Above we can see very simple implementation of hash map. We use constant size of table for simplicity and easy understanding. Note that library ready hash map need to be able to re-size / re-hash as it grow in size. But as I mentioned this implementation is provided for users to understand the concept of hash map better.

To calculate hash code we call default implementation of the key, in this case Integer.

java.lang.Integer hash code implementation:

public static int hashCode(int value) {
	return value;
}

And to compare if key (Integer) is equals we use java.lang.Integer equals method, internal implementation:

public boolean equals(Object obj) {
    if (obj instanceof Integer) {
        return value == ((Integer)obj).intValue();
    }
    return false;
}

Above hash map interface supported methods:

  • void put(Integer key, String value)
  • String get(Integer key)
  • String remove(Integer key)
  • boolean containsKey(Integer key)
  • int size()

As a backbone of this hash map we use:

SimpleLinkedList<Entry>[] table;

Each entry is defined as:

class Entry {
    Integer key;
    String value;
    int hashcode;
}

As as we can see that this table is defined as array of linked lists and we call it as levels or buckets of hash map. When we make put operation and key has different value but generate the same hash code it will end up in the same bucket (hash collision).
This will happen in 2 cases:

  • when we have poor hash code implementation
  • when we have small hash map size and a lot of inserts

This implementation can be easily changed to generic implementation from key as Integer and value as String.

Also we can add re-size / re-hash functionality to make this dictionary more flexible.

Also we have 4 usage / unit tests. In the last test we can see the bucket distribution:

[239, 250, 246, 257, 243, 253, 242, 223, 270, 236]

It means that under each linked list we have N nodes. In this case hash map is much too small and get / contains methods won't have O(1) time complexity. But this test was created to show if we have correct probability distribution over buckets. And it shows that our hash code method is quite good and distribution is good, from min 223 to max 270 nodes under buckets. If hash code would be poorly implemented we would have node distribution for example from min 10 to max 300.

References

0 contributions

Checkout latest Findings & News:

Checkout latest questions:

Checkout latest wiki articles:

Hey 👋
Would you like to know what we do?
  • Dirask is IT community, where we share coding knowledge and help each other to solve coding problems.
  • We welcome everyone,
    no matter what the experience,
    no matter how basic the question is,
    this community will help you.
Read more