Guide to MapDB – MapDB指南

最后修改: 2019年 7月 5日

中文/混合/英文(键盘快捷键:t)

1. Introduction

1.绪论

In this article, we’ll look at the MapDB library — an embedded database engine accessed through a collection-like API.

在这篇文章中,我们将看看MapDB库–一个通过类似于集合的API访问的嵌入式数据库引擎。

We start by exploring the core classes DB and DBMaker that help configure, open, and manage our databases. Then, we’ll dive into some examples of MapDB data structures that store and retrieve data.

我们首先探索核心类DBDBMaker,它们有助于配置、打开和管理我们的数据库。然后,我们将深入研究一些存储和检索数据的MapDB数据结构的例子。

Finally, we’ll look at some of the in-memory modes before comparing MapDB to traditional databases and Java Collections.

最后,我们将在比较MapDB与传统数据库和Java集合之前,看看一些内存模式。

2. Storing Data in MapDB

2.在MapDB中存储数据

First, let’s introduce the two classes that we’ll be using constantly throughout this tutorial — DB and DBMaker. The DB class represents an open database. Its methods invoke actions for creating and closing storage collections to handle database records, as well as handling transactional events.

首先,让我们介绍一下本教程中我们将不断使用的两个类–DBDBMaker。DB类代表了一个开放的数据库。其方法调用了创建和关闭存储集合的操作,以处理数据库记录,以及处理交易事件。

DBMaker handles database configuration, creation, and opening. As part of the configuration, we can choose to host our database either in-memory or on our file system.

DBMaker处理数据库的配置、创建和打开。作为配置的一部分,我们可以选择在内存中或在文件系统中托管我们的数据库。

2.1. A Simple HashMap Example

2.1.一个简单的HashMap例子

To understand how this works, let’s instantiate a new database in memory.

为了理解这一点,让我们在内存中实例化一个新的数据库。

First, let’s create a new in-memory database using the DBMaker class:

首先,让我们使用DBMaker类创建一个新的内存数据库。

DB db = DBMaker.memoryDB().make();

Once our DB object is up and running, we can use it to build an HTreeMap to work with our database records:

一旦我们的DB对象启动并运行,我们就可以使用它来构建一个HTreeMap来处理我们的数据库记录。

String welcomeMessageKey = "Welcome Message";
String welcomeMessageString = "Hello Baeldung!";

HTreeMap myMap = db.hashMap("myMap").createOrOpen();
myMap.put(welcomeMessageKey, welcomeMessageString);

HTreeMap is MapDB’s HashMap implementation. So, now that we have data in our database, we can retrieve it using the get method:

HTreeMap是MapDB的HashMap实现。所以,现在我们的数据库中有了数据,我们可以使用get方法来检索它。

String welcomeMessageFromDB = (String) myMap.get(welcomeMessageKey);
assertEquals(welcomeMessageString, welcomeMessageFromDB);

Finally, now that we’re finished with the database, we should close it to avoid further mutation:

最后,现在我们已经完成了对数据库的处理,我们应该关闭它以避免进一步的变异。

db.close();

To store our data in a file, rather than in memory, all we need to do is change the way that our DB object is instantiated:

为了将我们的数据存储在文件中,而不是在内存中,我们需要做的就是改变DB对象的实例化方式。

DB db = DBMaker.fileDB("file.db").make();

Our example above uses no type parameters. As a result, we’re stuck with casting our results to work with specific types. In our next example, we’ll introduce Serializers to eliminate the need for casting.

我们上面的例子没有使用类型参数。因此,我们只能将我们的结果投射到特定的类型上。在我们的下一个例子中,我们将引入序列化器来消除对铸造的需要。

2.2. Collections

2.2.收藏品

MapDB includes different collection types. To demonstrate, let’s add and retrieve some data from our database using a NavigableSet, which works as you might expect of a Java Set:

MapDB包括不同的集合类型。为了演示,让我们使用NavigableSet从数据库中添加和检索一些数据,其工作方式与您对JavaSet的期望一样。

Let’s start with a simple instantiation of our DB object:

让我们从我们的DB对象的简单实例化开始。

DB db = DBMaker.memoryDB().make();

Next, let’s create our NavigableSet:

接下来,让我们创建我们的NavigableSet

NavigableSet<String> set = db
  .treeSet("mySet")
  .serializer(Serializer.STRING)
  .createOrOpen();

Here, the serializer ensures that the input data from our database is serialized and deserialized using String objects.

在这里,序列化器确保来自我们数据库的输入数据使用String对象进行序列化和反序列化。

Next, let’s add some data:

接下来,让我们添加一些数据。

set.add("Baeldung");
set.add("is awesome");

Now, let’s check that our two distinct values have been added to the database correctly:

现在,让我们检查一下我们的两个不同的值是否已经被正确地添加到数据库中。

assertEquals(2, set.size());

Finally, since this is a set, let’s add a duplicate string and verify that our database still contains only two values:

最后,由于这是一个集合,让我们添加一个重复的字符串,验证我们的数据库仍然只包含两个值。

set.add("Baeldung");

assertEquals(2, set.size());

2.3. Transactions

2.3.事务

Much like traditional databases, the DB class provides methods to commit and rollback the data we add to our database.

与传统的数据库非常相似,DB 类提供了一些方法来commit rollback 我们添加到数据库的数据。

To enable this functionality, we need to initialize our DB with the transactionEnable method:

为了启用这一功能,我们需要用transactionEnable 方法来初始化我们的DB

DB db = DBMaker.memoryDB().transactionEnable().make();

Next, let’s create a simple set, add some data, and commit it to the database:

接下来,让我们创建一个简单的集合,添加一些数据,并提交到数据库。

NavigableSet<String> set = db
  .treeSet("mySet")
  .serializer(Serializer.STRING)
  .createOrOpen();

set.add("One");
set.add("Two");

db.commit();

assertEquals(2, set.size());

Now, let’s add a third, uncommitted string to our database:

现在,让我们为我们的数据库添加第三个未提交的字符串。

set.add("Three");

assertEquals(3, set.size());

If we’re not happy with our data, we can rollback the data using DB’s rollback method:

如果我们对我们的数据不满意,我们可以使用DB的rollback方法回滚数据。

db.rollback();

assertEquals(2, set.size());

2.4. Serializers

2.4.序列化器

MapDB offers a large variety of serializers, which handle the data within the collection. The most important construction parameter is the name, which identifies the individual collection within the DB object:

MapDB提供了大量的序列化器,用于处理集合中的数据。最重要的构造参数是名称,它标识了DB对象中的各个集合。

HTreeMap<String, Long> map = db.hashMap("indentification_name")
  .keySerializer(Serializer.STRING)
  .valueSerializer(Serializer.LONG)
  .create();

While serialization is recommended, it is optional and can be skipped. However, it’s worth noting that this will lead to a slower generic serialization process.

虽然建议进行序列化,但它是可选的,可以跳过。然而,值得注意的是,这将导致一个较慢的通用序列化过程。

3. HTreeMap

3.HTreeMap

MapDB’s HTreeMap provides HashMap and HashSet collections for working with our database. HTreeMap is a segmented hash tree and does not use a fixed-size hash table. Instead, it uses an auto-expanding index tree and does not rehash all of its data as the table grows. To top it off, HTreeMap is thread-safe and supports parallel writes using multiple segments.

MapDB的HTreeMap提供了HashMapHashSet集合,用于处理我们的数据库。HTreeMap是一个分段的哈希树,不使用固定大小的哈希表。相反,它使用了一个自动扩展的索引树,并且不会随着表的增长而重新洗刷所有的数据。最重要的是,HTreeMap是线程安全的,并且支持使用多段的并行写入。

To begin, let’s instantiate a simple HashMap that uses String for both keys and values:

首先,让我们实例化一个简单的HashMap,它的键和值都使用String

DB db = DBMaker.memoryDB().make();

HTreeMap<String, String> hTreeMap = db
  .hashMap("myTreeMap")
  .keySerializer(Serializer.STRING)
  .valueSerializer(Serializer.STRING)
  .create();

Above, we’ve defined separate serializers for the key and the value. Now that our HashMap is created, let’s add data using the put method:

上面,我们已经为键和值定义了单独的序列化器。现在我们的HashMap已经创建,让我们使用put方法添加数据。

hTreeMap.put("key1", "value1");
hTreeMap.put("key2", "value2");

assertEquals(2, hTreeMap.size());

As HashMap works on an Object’s hashCode method, adding data using the same key causes the value to be overwritten:

由于HashMapObject的hashCode方法上工作,使用同一键添加数据会导致值被覆盖。

hTreeMap.put("key1", "value3");

assertEquals(2, hTreeMap.size());
assertEquals("value3", hTreeMap.get("key1"));

4. SortedTableMap

4.SortedTableMap

MapDB’s SortedTableMap stores keys in a fixed-size table and uses binary search for retrieval. It’s worth noting that once prepared, the map is read-only.

MapDB的SortedTableMap在一个固定大小的表中存储键,并使用二进制搜索进行检索。值得注意的是,一旦准备好,该地图是只读的。

Let’s walk through the process of creating and querying a SortedTableMap. We’ll start by creating a memory-mapped volume to hold the data, as well as a sink to add data. On the first invocation of our volume, we’ll set the read-only flag to false, ensuring we can write to the volume:

让我们走过创建和查询SortedTableMap的过程。我们将首先创建一个内存映射的卷来保存数据,以及一个添加数据的汇。在第一次调用我们的卷时,我们将把只读标志设置为false,以确保我们可以向卷写入。

String VOLUME_LOCATION = "sortedTableMapVol.db";

Volume vol = MappedFileVol.FACTORY.makeVolume(VOLUME_LOCATION, false);

SortedTableMap.Sink<Integer, String> sink =
  SortedTableMap.create(
    vol,
    Serializer.INTEGER,
    Serializer.STRING)
    .createFromSink();

Next, we’ll add our data and call the create method on the sink to create our map:

接下来,我们将添加我们的数据并调用水槽上的create 方法来创建我们的地图。

for(int i = 0; i < 100; i++){
  sink.put(i, "Value " + Integer.toString(i));
}

sink.create();

Now that our map exists, we can define a read-only volume and open our map using SortedTableMap’s open method:

现在我们的地图已经存在,我们可以定义一个只读卷,并使用SortedTableMap的open方法打开我们的地图。

Volume openVol = MappedFileVol.FACTORY.makeVolume(VOLUME_LOCATION, true);

SortedTableMap<Integer, String> sortedTableMap = SortedTableMap
  .open(
    openVol,
    Serializer.INTEGER,
    Serializer.STRING);

assertEquals(100, sortedTableMap.size());

4.1. Binary Search

4.1.二进制搜索

Before we move on, let’s understand how the SortedTableMap utilizes binary search in more detail.

在我们继续之前,让我们更详细地了解SortedTableMap如何利用二进制搜索。

SortedTableMap splits the storage into pages, with each page containing several nodes comprised of keys and values. Within these nodes are the key-value pairs that we define in our Java code.

SortedTableMap将存储分成若干页,每页包含由键和值组成的若干节点。在这些节点中,是我们在Java代码中定义的键-值对。

SortedTableMap performs three binary searches to retrieve the correct value:

SortedTableMap执行了三次二进制搜索以检索正确的值。

  1. Keys for each page are stored on-heap in an array. The SortedTableMap performs a binary search to find the correct page.
  2. Next, decompression occurs for each key in the node. A binary search establishes the correct node, according to the keys.
  3. Finally, the SortedTableMap searches over the keys within the node to find the correct value.

5. In-Memory Mode

5.内存模式

MapDB offers three types of in-memory store. Let’s take a quick look at each mode, understand how it works, and study its benefits.

MapDB提供了三种类型的内存存储。让我们快速浏览一下每种模式,了解其工作原理,并研究其优点。

5.1. On-Heap

5.1.寰宇一家

The on-heap mode stores objects in a simple Java Collection Map. It does not employ serialization and can be very fast for small datasets. 

堆上模式将对象存储在一个简单的Java集合Map中。不采用序列化,对于小型数据集来说可以非常快。

However, since the data is stored on-heap, the dataset is managed by garbage collection (GC). The duration of GC rises with the size of the dataset, resulting in performance drops.

然而,由于数据被存储在堆上,数据集由垃圾收集(GC)管理。GC的持续时间随着数据集的大小而增加,导致性能下降。

Let’s see an example specifying the on-heap mode:

让我们看一个指定堆上模式的例子。

DB db = DBMaker.heapDB().make();

5.2. Byte[]

5.2 Byte[]

The second store type is based on byte arrays. In this mode, data is serialized and stored into arrays up to 1MB in size. While technically on-heap, this method is more efficient for garbage collection.

第二种存储类型是基于字节数组的。在这种模式下,数据被序列化并存储到大小为1MB的数组中。虽然在技术上是堆上的,但这种方法对于垃圾收集来说更有效率。

This is recommended by default, and was used in our ‘Hello Baeldung’ example:

这是默认推荐的,在我们的’Hello Baeldung’示例中使用了这一点。

DB db = DBMaker.memoryDB().make();

5.3. DirectByteBuffer

5.3.DirectByteBuffer

The final store is based on DirectByteBuffer. Direct memory, introduced in Java 1.4, allows the passing of data directly to native memory rather than Java heap. As a result, the data will be stored completely off-heap.

最后的存储是基于DirectByteBuffer.直接内存,在Java 1.4中引入,允许将数据直接传递到本地内存而不是Java堆。因此,数据将被存储在完全离开堆的地方。

We can invoke a store of this type with:

我们可以用以下方式调用这种类型的存储。

DB db = DBMaker.memoryDirectDB().make();

6. Why MapDB?

6.为什么是MapDB?

So, why use MapDB?

那么,为什么要使用MapDB呢?

6.1. MapDB vs Traditional Database

6.1.MapDB与传统数据库的比较

MapDB offers a large array of database functionality configured with just a few lines of Java code. When we employ MapDB, we can avoid the often time-consuming setup of various services and connections needed to get our program to work.

MapDB提供了大量的数据库功能,只需几行Java代码即可配置。当我们采用MapDB时,我们可以避免经常耗费时间的各种服务和连接的设置,以使我们的程序能够工作。

Beyond this, MapDB allows us to access the complexity of a database with the familiarity of a Java Collection. With MapDB, we do not need SQL, and we can access records with simple get method calls.

除此之外,MapDB还允许我们以Java Collection的熟悉程度来访问数据库的复杂性。使用MapDB,我们不需要SQL,我们可以通过简单的get方法调用来访问记录。

6.2. MapDB vs Simple Java Collections

6.2.MapDB与简单Java集合的比较

Java Collections will not persist the data of our application once it stops executing. MapDB offers a simple, flexible, pluggable service that allows us to quickly and easily persist the data in our application while maintaining the utility of Java collection types.

一旦我们的应用程序停止执行,Java集合将不会持久化我们的应用程序的数据。MapDB提供了一个简单、灵活、可插拔的服务,使我们能够快速、轻松地持久化我们应用程序中的数据,同时保持Java集合类型的效用。

7. Conclusion

7.结语

In this article, we’ve taken a deep dive into MapDB’s embedded database engine and collection framework.

在这篇文章中,我们对MapDB的嵌入式数据库引擎和采集框架进行了深入的研究。

We started by looking at the core classes DB and DBMaker to configure, open and manage our database. Then, we walked through some examples of data structures that MapDB offers to work with our records. Finally, we looked at the advantages of MapDB over a traditional database or Java Collection.

我们首先查看了核心类DBDBMaker来配置、打开和管理我们的数据库。然后,我们浏览了一些MapDB提供的数据结构的例子,以处理我们的记录。最后,我们研究了MapDB相对于传统数据库或Java Collection的优势。

As always, the example code is available over on GitHub.

像往常一样,示例代码可在GitHub上获得