Introduction to Big Queue – 大队列简介

最后修改: 2020年 1月 16日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

In this tutorial, we’re going to take a quick look at Big Queue, a Java implementation of a persistent queue.

在本教程中,我们将快速浏览大队列,这是一个持久化队列的Java实现

We’ll talk a bit about its architecture, and then we’ll learn how to use it through quick and practical examples.

我们将谈一谈它的架构,然后我们将通过快速和实际的例子来学习如何使用它。

2. Usage

2.使用情况

We’ll need to add the bigqueue dependency to our project:

我们需要将bigqueue依赖性添加到我们的项目中。

<dependency>
    <groupId>com.leansoft</groupId>
    <artifactId>bigqueue</artifactId>
    <version>0.7.0</version>
</dependency>

We also need to add its repository:

我们还需要添加其存储库。

<repository>
    <id>github.release.repo</id>
    <url>https://raw.github.com/bulldog2011/bulldog-repo/master/repo/releases/</url>
</repository>

If we’re used to working with basic queues, it’ll be a breeze to adapt to Big Queue as its API is quite similar.

如果我们习惯于使用基本队列,那么适应Big Queue将是一件轻而易举的事情,因为它的API相当相似。

2.1. Initialization

2.1.初始化

We can initialize our queue by simpling calling its constructor:

我们可以通过简单地调用其构造函数来初始化我们的队列。

@Before
public void setup() {
    String queueDir = System.getProperty("user.home");
    String queueName = "baeldung-queue";
    bigQueue = new BigQueueImpl(queueDir, queueName);
}

The first argument is the home directory for our queue.

第一个参数是我们队列的主目录。

The second argument represents our queue’s name. It’ll create a folder inside our queue’s home directory where we can persist data.

第二个参数代表我们队列的名称。它将在我们队列的主目录中创建一个文件夹,在那里我们可以持久地保存数据。

We should remember to close our queue when we’re done to prevent memory leaks:

我们应该记得在完成后关闭我们的队列,以防止内存泄漏:

bigQueue.close();

2.2. Inserting

2.2.插入

We can add elements to the tail by simply calling the enqueue method:

我们可以通过简单地调用enqueue方法向尾部添加元素。

@Test
public void whenAddingRecords_ThenTheSizeIsCorrect() {
    for (int i = 1; i <= 100; i++) {
        bigQueue.enqueue(String.valueOf(i).getBytes());
    }
 
    assertEquals(100, bigQueue.size());
}

We should note that Big Queue only supports the byte[] data type, so we are responsible for serializing our records when inserting.

我们应该注意,Big Queue只支持byte[]数据类型,所以我们在插入时要负责序列化我们的记录。

2.3. Reading

2.3.阅读

As we might’ve expected, reading data is just as easy using the dequeue method:

正如我们所期望的那样,使用dequeue方法,读取数据也同样简单。

@Test
public void whenAddingRecords_ThenTheyCanBeRetrieved() {
    bigQueue.enqueue(String.valueOf("new_record").getBytes());

    String record = new String(bigQueue.dequeue());
 
    assertEquals("new_record", record);
}

We also have to be careful to properly deserialize our data when reading.

我们还必须注意在阅读时正确地反序列化我们的数据。

Reading from an empty queue throws a NullPointerException.

从空队列中读取数据会抛出一个NullPointerException.

We should verify that there are values in our queue using the isEmpty method:

我们应该使用isEmpty方法验证我们的队列中是否有值。

if(!bigQueue.isEmpty()){
    // read
}

To empty our queue without having to go through each record, we can use the removeAll method:

为了清空我们的队列,而不必逐一查看每条记录,我们可以使用removeAll方法:

bigQueue.removeAll();

2.4. Peeking

2.4.窥视

When peeking, we simply read a record without consuming it:

当偷看时,我们只是读取一条记录,而不消耗它。

@Test
public void whenPeekingRecords_ThenSizeDoesntChange() {
    for (int i = 1; i <= 100; i++) {
        bigQueue.enqueue(String.valueOf(i).getBytes());
    }
 
    String firstRecord = new String(bigQueue.peek());

    assertEquals("1", firstRecord);
    assertEquals(100, bigQueue.size());
}

2.5. Deleting Consumed Records

2.5.删除已消耗的记录

When we’re calling the dequeue method, records are removed from our queue, but they remain persisted on disk.

当我们调用dequeue方法时,记录被从我们的队列中移除,但它们仍然在磁盘上持久存在。

This could potentially fill up our disk with unnecessary data.

这有可能使我们的磁盘充满不必要的数据。

Fortunately, we can delete the consumed records using the gc method:

幸运的是,我们可以使用gc方法删除已消耗的记录:

bigQueue.gc();

Just like the garbage collector in Java cleans up unreferenced objects from heap, gc cleans consumed records from our disk.

就像Java中的垃圾收集器从堆中清理未引用的对象一样,gc从我们的磁盘中清理被消耗的记录。

3. Architecture and Features

3.结构和特点

What’s interesting about Big Queue is the fact that its codebase is extremely small — just 12 source files occupying about 20KB of disk space.

Big Queue的有趣之处在于,它的代码库非常小–只有12个源文件,占据了大约20KB的磁盘空间。

On a high level, it’s just a persistent queue that excels at handling large amounts of data.

在高层次上,它只是一个持久的队列,擅长处理大量的数据。

3.1. Handling Large Amounts of Data

3.1.处理大量的数据

The size of the queue is limited only by our total disk space available. Every record inside our queue is persisted on disk, in order to be crash-resistant.

队列的大小仅受我们可用的总磁盘空间的限制。我们队列中的每条记录都被持久化在磁盘上,以达到抗崩溃的目的。

Our bottleneck will be the disk I/O, meaning that an SSD will significantly improve the average throughput over an HDD.

我们的瓶颈将是磁盘I/O,这意味着SSD将比HDD显著提高平均吞吐量。

3.2. Accessing Data Extremely Fast

3.2.访问数据的速度极快

If we take a look at its source code, we’ll notice that the queue is backed by a memory-mapped file. The accessible part of our queue (the head) is kept in RAM, so accessing records will be extremely fast.

如果我们看一下它的源代码,我们会发现队列是由一个内存映射的文件支持的。我们队列的可访问部分(头部)被保存在RAM中,所以访问记录的速度会非常快。

Even if our queue would grow extremely large and would occupy terabytes of disk space, we would still be able to read data in O(1) time complexity.

即使我们的队列会增长得非常大,并占据了TB级的磁盘空间,我们仍然能够以O(1)的时间复杂性读取数据。

If we need to read lots of messages and speed is a critical concern, we should consider using an SSD over an HDD, as moving data from disk to memory would be much faster.

如果我们需要读取大量的信息,并且速度是一个关键的问题,我们应该考虑使用SSD而不是HDD,因为将数据从磁盘移动到内存会快得多。

3.3. Advantages

3.3.优势

A great advantage is its ability to grow very large in size. We can scale it to theoretical infinity by just adding more storage, hence its name “Big”.

一个巨大的优势是它能够增长非常大的尺寸。我们可以通过添加更多的存储空间将其扩展到理论上的无限大,因此它被称为 “大”。

In a concurrent environment, Big Queue can produce and consume around 166MBps of data on a commodity machine.

在一个并发的环境中,Big Queue可以在一台商品机上产生和消耗大约166MBps的数据。

If our average message size is 1KB, it can process 166k messages per second.

如果我们的平均信息量为1KB,那么它每秒可以处理166k条信息

It can go up to 333k messages per second in a single-threaded environment — pretty impressive!

在单线程环境下,它可以达到每秒333k条信息–相当令人印象深刻!

3.4. Disadvantages

3.4.劣势

Our messages remain persisted to disk, even after we’ve consumed them, so we have to take care of garbage-collecting data when we no longer need it.

我们的信息仍然持久地保存在磁盘上,即使我们已经消耗了它们,所以我们必须处理好垃圾,在我们不再需要它的时候收集数据。

We are also responsible for serializing and deserializing our messages.

我们也负责对我们的信息进行序列化和反序列化。

4. Conclusion

4.总结

In this quick tutorial, we learned about Big Queue and how we can use it as a scalable and persistent queue.

在这个快速教程中,我们了解了大队列以及如何将其作为一个可扩展的持久性队列。

As always, the code is available over on Github.

像往常一样,代码可以在Github上获得