MapDB logo

Introduction

In many of my personal projects, I needed a way to persist information efficiently. I aimed to keep the solution simple, avoiding the complexities of more elaborate infrastructure setups. Implementing a database or Redis would complicate a straightforward project and consume additional server resources, which I wanted to avoid.

Initially, I opted to use the filesystem to store information. My approach involved marshalling objects to JSON and saving them to disk. When modifications were necessary, I would read the JSON string from the disk, unmarshal it, make the required changes, and save it back. This method sufficed for basic problems, but it often introduced performance overhead in more complex scenarios.

After conducting some research, I discovered a solution that effectively meets my needs in many cases: MapDB.

MapDB

MapDB is an open-source (Apache 2.0 licensed), embedded Java database engine and collection framework. Just add dependency:

<dependency>
    <groupId>org.mapdb</groupId>
    <artifactId>mapdb</artifactId>
    <version>VERSION</version>
</dependency>

and you may use it in multiple configurations, for example:

DB db = DBMaker
    .fileDB("file.db")
    .transactionEnable()
    .closeOnJvmShutdown()
    .make();
ConcurrentMap visitedURLs = db.hashMap("visited-urls")
    .createOrOpen();

visitedURLs.put("https://example.com", true);
visitedURLs.close();

Custom type example

To store a complex objects, you need to define serializers, here I’ve chosen java serialization.

ConcurrentMap ads = db.hashMap("map")
    .keySerializer(Serializer.JAVA)
    .valueSerializer(Serializer.JAVA)
    .createOrOpen();

Because of it, my CustomType needed to implement Serializable interface.

record CustomType(String field) implements Serializable {

Key Features of MapDB

MapDB provides various of features:

  • Memory and Disk Optimizations - Efficient use of both memory and disk resources for optimal performance.
  • Disk Write Modes - Supports various modes for writing to disk, enhancing flexibility and control.
  • Memory Management - Robust options for managing memory, suitable for both small and large datasets.
  • Allocation Options - Customizable storage allocation settings to improve performance and resource utilization.
  • Write Ahead Log (WAL) - Ensures atomic and durable file changes, protecting data from corruption.
  • In-Memory Stores - Offers on-heap, serialized byte[], and DirectByteBuffer options for in-memory data storage.
  • File Access Methods - Supports RandomAccessFile, FileChannel, and memory-mapped files (mmap) for versatile file access.
  • Automatic Shutdown Handling - Provides shutdown hooks to close the database automatically before JVM exits.
  • Data Recovery - Options to open corrupted stores in readonly mode for data rescue.
  • Transaction Management - Supports transactions, enabling rollback and commit operations for data integrity.

By leveraging these features, MapDB provides a powerful and flexible solution for data persistence in various applications. Let’s review the most important.

Data Integrity

MapDB offers robust mechanisms to protect data from corruption during JVM crashes or terminations. One key feature is the Write Ahead Log (WAL), which ensures file changes are atomic and durable, a technique also used by major databases like PostgreSQL and MySQL. However, WAL comes with a trade-off in performance, as data must be copied and synced multiple times between files.

By default, WAL is disabled in MapDB. To enable it, use the DBMaker.transactionEnable() method:

DB db = DBMaker
    .fileDB(file)
    .transactionEnable()
    .make();

When WAL is disabled (the default setting), your data is not protected against crashes. In this case, you must ensure the store is correctly closed, or risk losing all data. MapDB detects unclean shutdowns and will refuse to open such corrupted storage. However, there is an option to open corrupted stores in readonly mode for data recovery.

Automatic Shutdown Handling

MapDB offers a shutdown hook that automatically closes the database upon JVM exit, which can be enabled with DBMaker.closeOnJvmShutdown(). It ensures proper closure during normal shutdowns; however, it does not safeguard data in the event of JVM crashes or abrupt terminations.

Caching

One of the interesting feature that I would like to highlight is value expiring. It can be used as a simple caching layer, example:

DB db = DBMaker.fileDB("expire.db").transactionEnable().make();
ConcurrentMap map = db.hashMap("map")
   .expireAfterCreate(2, TimeUnit.SECONDS)
   .expireExecutor(Executors.newScheduledThreadPool(1))
   .createOrOpen();

map.put("key", "value");
db.commit();

TimeUnit.SECONDS.sleep(10);

map.get("key"); // NULL

db.close();

Transaction Management

With transactions disabled, MapDB does not support rollback, and db.rollback() will throw an exception. Since data is stored immediately, db.commit() essentially flushes write caches and synchronizes storage files. This means if you call db.commit() and make no further writes, your data should be safe even in the event of a JVM crash.

MapDB supports transactions as set of changes. You can use methods:

  • commit() - to write changes made from the last commit/rollback,
  • rollback() - to erase changes made from the last commit/rollback.

To use transactions, you need to call transactionEnable() during the creation of DB object.

DB db = DBMaker.fileDB("some.db").transactionEnable().make();
ConcurrentMap map = db.hashMap("map").createOrOpen();

map.put("one", "1");
db.commit(); // ["one", "1"]

map.put("two", "2");
db.rollback(); //["one", "1"]

db.close();

Advanced Features of MapDB

MapDB offers several advanced features that enhance its performance and versatility:

  • Memory and Disk Optimizations - MapDB is designed to optimize both memory and disk usage, ensuring efficient data storage and retrieval.
  • Disk Write Modes - MapDB supports various disk write modes, allowing you to tailor data persistence to your specific needs.
  • Memory Management - MapDB provides robust memory management options suitable for handling both small and large datasets.
  • Allocation Options - MapDB allows customization of storage allocation to improve performance and resource utilization.

A Potential Drawback of Using MapDB

While MapDB is a robust and useful open-source project, it has a notable drawback. The Github repository has lost momentum over the past 3-4 years. Additionally, the number of reported issues appears to be not actively addressed, which may be a concern for ongoing support and updates.

One major drawback of embedded libraries is their inability to allow state (file) access from multiple applications. This integration simplifies development but ties the state to a single application, creating a significant obstacle.

For shared state management, consider alternatives like relational databases or NoSQL solutions. These options enable state accessibility across multiple applications, providing the necessary flexibility.

Summary

In my quest for an efficient data persistence solution for small projects, I discovered MapDB, a highly useful open-source library. MapDB effectively addresses the challenge of saving data without complicating the infrastructure. Its design and functionality make it a battle-tested option, ensuring reliability and simplicity for various project needs.