java.util.HashMap is a basic data structure for Java Virtual Machine. It provides functionality to map value by key. The most important interface implemented is Map. It is a generic class, so key and value can be of any type. HashMap will calculate the hashCode of the object to operate with it. What interesting is that this implementation provides constant-time performance for the basic operations (get and put).

In this article I will overview the internal implementation and answer the question, How does it provide constant-time?

Constructors

There are 3 main constructors for HashMap:

  • default one public HashMap(). It doesn’t have any arguments. By default, it assumes initialCapacity=16 and loadFactor=0.75.
  • public HashMap(int initialCapacity) - in this constructor the default loadFactor=0.75.
  • public HashMap(int initialCapacity, float loadFactor) - in this constructor, both parameters can be customized.

Internals

HashMap stores elements in so-called buckets and the number of buckets is called capacity. Every time you insert an element, hashCode as integer needs to be calculated. Integer can store 4,294,967,295 values. For sure we cannot have that amount of buckets. In real live Java starts with 16 buckets. It needs to map calculated hashCode to a number of buckets. It uses a simple algorithm:

targetBuckerNumber = hashCode % buckerNumbers;

Collisions

So what will happen when we have a conflict of hashCodes? It will simply make a list of keys. It can be visualized by the picture below:

HashMap structure

Load factor and capacity

When does HashMap know when to increase the number of buckets? The algorithm is pretty easy:

Let’s say our initial capacity of Hashmap is 16. The default load factor is 0.75. Maximum number of elements for HashMap = 16 * 0.75 = 12. So the HashMap can store 12 elements, when 13-th elements occurred, HashMap will increase capacity by a factor of 2, so it will become 32.

JEP 180: Handle Frequent HashMap Collisions with Balanced Trees

As we can see, even with an increasing number of buckets, there is still possible that one bucket will contain a lot of keys. Iteration over elements create means that time complexity is O(n). But Hashmap guarantees that access time would be constant.

For this to work correctly, equal keys must have the same hash, however, different keys can have the same hash. If two different keys have the same hash, the two values belonging to them will be stored in the same bucket. Inside a bucket, values are stored in a list and retrieved by looping over all elements. The cost of this is O(n).

Java 8 came with a solution of JEP 180 the data structure in which the values inside one bucket are stored is changed from a list to a balanced tree if a bucket contains 8 or more values, and it’s changed back to a list if, at some point, only 6 values are left in the bucket. This improves the performance to be O(log n).

Summary

In this article, we discovered the internal implementation of HashMap and found an optimization mechanism that keeps this data structure extremely efficient.