Guide to FastUtil – FastUtil指南

最后修改: 2019年 5月 4日

中文/混合/英文(键盘快捷键:t)

1. Introduction

1.绪论

In this tutorial, we’ll be looking at the FastUtil library.

在本教程中,我们将研究FastUtil库。

First, we’ll code some examples of its type-specific collections.

首先,我们将对其特定类型集合的一些例子进行编码。

Then, we’ll analyze the performance that gives FastUtil its name.

然后,我们将分析给FastUtil 其名字的性能。

Finally, let’s take a peek at FastUtil‘s BigArray utilities.

最后,让我们来看看FastUtilBigArrayutilities.

2. Features

2.特点

The FastUtil Java library seeks to extend the Java Collections Framework. It provides type-specific maps, sets, lists and queues with a smaller memory footprint and fast access and insertion. FastUtil also provides a set of utilities for working with and manipulating large (64-bit) arrays, sets and lists.

FastUtilJava库旨在扩展Java集合框架。它提供了特定类型的映射、集合、列表和队列,具有较小的内存占用率和快速访问和插入。FastUtil还提供了一套用于处理和操作大型(64位)数组、集合和列表的实用程序

The library also includes a multitude of practical Input/Output classes for binary and text files.

该库还包括一个用于二进制和文本文件的多种实用的输入/输出类

Its latest release, FastUtil 8, also released a host of type-specific functions, extending the JDK’s Functional Interfaces.

它的最新版本FastUtil 8也发布了大量特定类型的函数,扩展了JDK的功能接口

2.1. Speed

2.1 速度

In many cases, the FastUtil implementations are the fastest available. The authors have even provided their own in-depth benchmark report, comparing it against similar libraries include HPPC and Trove.

在许多情况下,FastUtil 的实现是目前最快的。作者甚至提供了自己的深入基准报告,将其与类似的库进行比较,包括HPPC Trove。HPPC Trove。

In this tutorial, we’ll look to define our own benchmarks using the Java Microbench Harness (JMH).

在本教程中,我们将使用Java Microbench Harness(JMH)来定义我们自己的基准。

3. Full Sized Dependency

3.全尺寸的依赖性

On top of the usual JUnit dependency, we’ll be using the FastUtils and JMH dependencies in this tutorial.

除了通常的JUnit依赖之外,我们将在本教程中使用FastUtilsJMH 依赖

We’ll need the following dependencies in our pom.xml file:

在我们的pom.xml文件中,我们将需要以下依赖项。

<dependency>
    <groupId>it.unimi.dsi</groupId>
    <artifactId>fastutil</artifactId>
    <version>8.2.2</version>
</dependency>
<dependency>
    <groupId>org.openjdk.jmh</groupId>
    <artifactId>jmh-core</artifactId>
    <version>1.35</version>
    <scope>test</scope>
</dependency>
<dependency>
    <groupId>org.openjdk.jmh</groupId>
    <artifactId>jmh-generator-annprocess</artifactId>
    <version>1.35</version>
    <scope>test</scope>
</dependency>

Or for Gradle users:

或者对于Gradle用户。

testCompile group: 'org.openjdk.jmh', name: 'jmh-core', version: '1.19'
testCompile group: 'org.openjdk.jmh', name: 'jmh-generator-annprocess', version: '1.19'
compile group: 'it.unimi.dsi', name: 'fastutil', version: '8.2.2'

3.1. Customized Jar File

3.1.定制的Jar文件

Due to the lack of generics, FastUtils generates a large number of type-specific classes. And unfortunately, this leads to a huge jar file.

由于缺乏泛型,FastUtils生成了大量的特定类型的类。而不幸的是,这导致了一个巨大的jar文件。

However, luckily for us, FastUtils includes a find-deps.sh script which allows generation of smaller, more focused jars comprising of only the classes we want to use in our application.

然而,幸运的是,FastUtils 包括一个find-deps.sh 脚本,它允许生成更小、更集中的jars,只包括我们想在应用中使用的类。

4. Type-Specific Collections

4.特定类型的收藏

Before we begin, let’s take a quick peek at the simple process of instantiating a type-specific collection. Let’s pick a HashMap that stores keys and values using doubles. 

在我们开始之前,让我们快速浏览一下实例化特定类型集合的简单过程。让我们挑选一个HashMap,它使用二进制来存储键和值。

For this purpose, FastUtils provides a Double2DoubleMap interface and a Double2DoubleOpenHashMap implementation:

为此,FastUtils提供了一个Double2DoubleMap接口和一个Double2DoubleOpenHashMap实施方案。

Double2DoubleMap d2dMap = new Double2DoubleOpenHashMap();

Now that we’ve instantiated our class, we can simply populate data as we would with any Map from the Java Collections API:

现在我们已经实例化了我们的类,我们可以像使用Java集合API中的任何Map那样简单地填充数据。

d2dMap.put(2.0, 5.5);
d2dMap.put(3.0, 6.6);

Finally, we can check that the data has been added correctly:

最后,我们可以检查数据是否已被正确添加。

assertEquals(5.5, d2dMap.get(2.0));

4.1. Performance

4.1.业绩

FastUtils focuses on its performant implementations. In this section, we’ll make use of the JMH to verify that fact. Let’s compare the Java Collections HashSet<Integer> implementation against FastUtil’s IntOpenHashSet.

FastUtils专注于其高性能的实现。在本节中,我们将利用JMH来验证这一事实。让我们将Java Collections的HashSet<Integer>实现与FastUtil的IntOpenHashSet进行比较。

First, let’s see how to implement the IntOpenHashSet:

首先,让我们看看如何实现IntOpenHashSet:

@Param({"100", "1000", "10000", "100000"})
public int setSize;

@Benchmark
public IntSet givenFastUtilsIntSetWithInitialSizeSet_whenPopulated_checkTimeTaken() {
    IntSet intSet = new IntOpenHashSet(setSize);
    for(int i = 0; i < setSize; i++) {
        intSet.add(i);
    }
    return intSet; 
}

Above, we’ve simply declared the IntOpenHashSet implementation of the IntSet interface. We’ve also declared the initial size setSize with the @Param annotation.

上面,我们简单地声明了IntOpenHashSet IntSet 接口的实现。我们还用@Param注释声明了初始大小setSize

Put simply, these numbers are fed into JMH to produce a series of benchmark tests with different set sizes.

简单地说,这些数字被送入JMH,以产生一系列不同规模的基准测试。

Next, let’s do the same thing using the Java Collections implementation:

接下来,让我们使用Java集合实现做同样的事情:

@Benchmark
public Set<Integer> givenCollectionsHashSetWithInitialSizeSet_whenPopulated_checkTimeTaken() {
    Set<Integer> intSet = new HashSet<>(setSize);
    for(int i = 0; i < setSize; i++) {
        intSet.add(i);
    }
    return intSet;
}

Finally, let’s run the benchmark and compare the two implementations:

最后,让我们运行基准并比较这两种实现。

Benchmark                                     (setSize)  Mode  Cnt     Score   Units
givenCollectionsHashSetWithInitialSizeSet...        100  avgt    2     1.460   us/op
givenCollectionsHashSetWithInitialSizeSet...       1000  avgt    2    12.740   us/op
givenCollectionsHashSetWithInitialSizeSet...      10000  avgt    2   109.803   us/op
givenCollectionsHashSetWithInitialSizeSet...     100000  avgt    2  1870.696   us/op
givenFastUtilsIntSetWithInitialSizeSet...           100  avgt    2     0.369   us/op
givenFastUtilsIntSetWithInitialSizeSet...          1000  avgt    2     2.351   us/op
givenFastUtilsIntSetWithInitialSizeSet...         10000  avgt    2    37.789   us/op
givenFastUtilsIntSetWithInitialSizeSet...        100000  avgt    2   896.467   us/op

These results make it clear the FastUtils implementation is much more performant than the Java Collections alternative.

这些结果表明,FastUtils的实现比Java Collections的实现更有性能。

5. Big Collections

5.大规模收藏

Another important feature of FastUtils is the ability to use 64-bit arrays. Arrays in Java, by default, are limited to 32 bits.

FastUtils的另一个重要功能是能够使用64位数组。Java中的数组,默认情况下,被限制在32位。

To get started, let’s take a look at the BigArrays class for Integer types. IntBigArrays provides static methods for working with 2-dimensional Integer arrays. By using these provided methods, we can essentially wrap our array into a more user-friendly 1-dimensional array.

为了开始,让我们看一下BigArrays类中的Integer类型。IntBigArrays 提供了用于处理二维Integer数组的静态方法。通过使用这些提供的方法,我们基本上可以将我们的数组包装成一个更便于使用的一维数组。

Let’s take a look at how this works.

让我们来看看这是如何运作的。

First, we’ll start by initializing a 1-dimensional array, and converting it into a 2-dimensional array using IntBigArray’s wrap method:

首先,我们先初始化一个一维数组,并使用IntBigArray的wrap方法将其转换为一个二维数组。

int[] oneDArray = new int[] { 2, 1, 5, 2, 1, 7 };
int[][] twoDArray = IntBigArrays.wrap(oneDArray.clone());

We should make sure to use the clone method to ensure a deep copy of the array.

我们应该确保使用clone方法来确保阵列的深度拷贝。

Now, as we’d do with a List or a Map, we can gain access to the elements using the get method:

现在,正如我们对ListMap所做的那样,我们可以使用get方法来访问这些元素。

int firstIndex = IntBigArrays.get(twoDArray, 0);
int lastIndex = IntBigArrays.get(twoDArray, IntBigArrays.length(twoDArray)-1);

Finally, let’s add some checks to ensure our IntBigArray returns the correct values:

最后,让我们添加一些检查来确保我们的IntBigArray返回正确的值。

assertEquals(2, firstIndex);
assertEquals(7, lastIndex);

6. Conclusion

6.结语

In this article, we’ve taken a dive into FastUtils core features.

在这篇文章中,我们对FastUtils核心功能进行了深入研究。

We looked at some of the type-specific collections that FastUtil offers, before playing around with some BigCollections.

我们看了一些FastUtil提供的特定类型的集合,然后玩了一些BigCollections

As always, the code can be found over on GitHub

一如既往,代码可以在GitHub上找到。