HashMap internals
java.util.HashMap
is a basic data structure for Java Virtual Machine
.
It provides functionality to map value by key.
The most important interface implemented is Map
.
It is a generic class, so key and value can be of any type.
HashMap
will calculate the hashCode
of the object to operate with it.
What interesting is that this implementation provides constant-time
performance for the basic operations (get
and put
).
In this article I will overview the internal implementation and answer the question, How does it provide constant-time
?
Constructors
There are 3 main constructors for HashMap:
- default one
public HashMap()
. It doesn’t have any arguments. By default, it assumesinitialCapacity=16
andloadFactor=0.75
. public HashMap(int initialCapacity)
- in this constructor the defaultloadFactor=0.75
.public HashMap(int initialCapacity, float loadFactor)
- in this constructor, both parameters can be customized.
Internals
HashMap stores elements in so-called buckets
and the number of buckets is called capacity
.
Every time you insert an element, hashCode
as integer
needs to be calculated.
Integer
can store 4,294,967,295
values. For sure we cannot have that amount of buckets.
In real live Java starts with 16
buckets. It needs to map calculated hashCode to a number of buckets.
It uses a simple algorithm:
targetBuckerNumber = hashCode % buckerNumbers;
Collisions
So what will happen when we have a conflict of hashCodes? It will simply make a list of keys. It can be visualized by the picture below:
Load factor and capacity
When does HashMap know when to increase the number of buckets? The algorithm is pretty easy:
Let’s say our initial capacity of Hashmap is 16. The default load factor is 0.75. Maximum number of elements for HashMap = 16 * 0.75 = 12. So the HashMap can store 12 elements, when 13-th elements occurred, HashMap will increase capacity by a factor of 2, so it will become 32.
JEP 180: Handle Frequent HashMap Collisions with Balanced Trees
As we can see, even with an increasing number of buckets, there is still possible that one bucket will contain a lot of keys. Iteration over elements create means that time complexity is O(n). But Hashmap guarantees that access time would be constant.
For this to work correctly, equal keys must have the same hash, however, different keys can have the same hash. If two different keys have the same hash, the two values belonging to them will be stored in the same bucket. Inside a bucket, values are stored in a list and retrieved by looping over all elements. The cost of this is O(n).
Java 8 came with a solution of JEP 180 the data structure in which the values inside one bucket are stored is changed from a list to a balanced tree if a bucket contains 8 or more values, and it’s changed back to a list if, at some point, only 6 values are left in the bucket. This improves the performance to be O(log n).
Summary
In this article, we discovered the internal implementation of HashMap and found an optimization mechanism that keeps this data structure extremely efficient.