Introduction to Spliterator in Java – Java中的Spliterator简介

最后修改: 2018年 1月 27日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

The Spliterator interface, introduced in Java 8, can be used for traversing and partitioning sequences. It’s a base utility for Streams, especially parallel ones.

Java 8中引入的Spliterator接口可以用于遍历和分割序列。它是Streams的基本工具,尤其是并行的。

In this article, we’ll cover its usage, characteristics, methods and how to create our own custom implementations.

在这篇文章中,我们将介绍它的用法、特点、方法以及如何创建我们自己的自定义实现。

2. Spliterator API

2.Spliterator API

2.1. tryAdvance

2.1.tryAdvance

This is the main method used for stepping through a sequence. The method takes a Consumer that’s used to consume elements of the Spliterator one by one sequentially and returns false if there’re no elements to be traversed.

这是用于步入一个序列的主要方法。该方法获取一个Consumer,用于一个一个地消耗Spliterator中的元素,如果没有元素需要遍历,则返回false

Here, we’ll take a look at how to use it to traverse and partition elements.

在这里,我们将看一下如何使用它来遍历和分割元素。

First, let’s assume that we’ve got an ArrayList with 35000 articles and that Article class is defined as:

首先,让我们假设我们有一个有35000篇文章的ArrayList,并且Article类被定义为。

public class Article {
    private List<Author> listOfAuthors;
    private int id;
    private String name;
    
    // standard constructors/getters/setters
}

Now, let’s implement a task that processes the list of articles and adds a suffix of “– published by Baeldung” to each article name:

现在,让我们实现一个任务,处理文章列表,并为每篇文章的名称添加一个后缀”– 由Baeldung出版”

public String call() {
    int current = 0;
    while (spliterator.tryAdvance(a -> a.setName(article.getName()
      .concat("- published by Baeldung")))) {
        current++;
    }
    
    return Thread.currentThread().getName() + ":" + current;
}

Notice that this task outputs the number of articles processed when it finishes the execution.

请注意,这个任务在执行完毕后会输出所处理的文章数量。

Another key point is that we used tryAdvance() method to process the next element.

另一个关键点是,我们使用tryAdvance()方法来处理下一个元素。

2.2. trySplit

2.2. trySplit

Next, let’s split Spliterators (hence the name) and process partitions independently.

接下来,让我们分割Spliterators (因此而得名)并独立处理分区。

The trySplit method tries to split it into two parts. Then the caller process elements, and finally, the returned instance will process the others, allowing the two to be processed in parallel.

trySplit方法试图将其分成两部分。然后调用者处理元素,最后,返回的实例将处理其他的元素,允许两者并行处理。

Let’s generate our list first:

让我们首先生成我们的清单。

public static List<Article> generateElements() {
    return Stream.generate(() -> new Article("Java"))
      .limit(35000)
      .collect(Collectors.toList());
}

Next, we obtain our Spliterator instance using the spliterator() method. Then we apply our trySplit() method:

接下来,我们使用spliterator()方法获得我们的Spliterator实例。然后我们应用我们的trySplit()方法。

@Test
public void givenSpliterator_whenAppliedToAListOfArticle_thenSplittedInHalf() {
    Spliterator<Article> split1 = Executor.generateElements().spliterator(); 
    Spliterator<Article> split2 = split1.trySplit(); 
    
    assertThat(new Task(split1).call()) 
      .containsSequence(Executor.generateElements().size() / 2 + ""); 
    assertThat(new Task(split2).call()) 
      .containsSequence(Executor.generateElements().size() / 2 + ""); 
}

The splitting process worked as intended and divided the records equally.

分割过程如期进行,并平均分配了这些记录

2.3. estimatedSize

2.3.估计的尺寸

The estimatedSize method gives us an estimated number of elements:

estimatedSize方法给我们一个估计的元素数量。

LOG.info("Size: " + split1.estimateSize());

This will output:

这将输出。

Size: 17500

2.4. hasCharacteristics

2.4.hasCharacteristics

This API checks if the given characteristics match the properties of the Spliterator. Then if we invoke the method above, the output will be an int representation of those characteristics:

这个API检查给定的特征是否与Spliterator的属性相符。然后如果我们调用上面的方法,输出将是这些特性的int 代表。

LOG.info("Characteristics: " + split1.characteristics());
Characteristics: 16464

3. Spliterator Characteristics

3.分裂器特点

It has eight different characteristics that describe its behavior. Those can be used as hints for external tools:

它有八个不同的特征来描述其行为。这些可以作为外部工具的提示:

  • SIZED if it’s capable of returning an exact number of elements with the estimateSize() method
  • SORTED – if it’s iterating through a sorted source
  • SUBSIZED – if we split the instance using a trySplit() method and obtain Spliterators that are SIZED as well
  • CONCURRENT – if source can be safely modified concurrently
  • DISTINCT – if for each pair of encountered elements x, y, !x.equals(y)
  • IMMUTABLE – if elements held by source can’t be structurally modified
  • NONNULL – if source holds nulls or not
  • ORDERED – if iterating over an ordered sequence

4. A Custom Spliterator

4.一个自定义的分裂器

4.1. When to Customize

4.1.何时定制

First, let’s assume the following scenario:

首先,让我们假设以下情况。

We’ve got an article class with a list of authors, and the article that can have more than one author. Furthermore, we consider an author related to the article if his related article’s id matches article id.

我们有一个带有作者列表的文章类,而文章可以有一个以上的作者。此外,如果一个作者的相关文章的ID与文章的ID相符,我们就认为他与文章有关。

Our Author class will look like the this:

我们的Author类将看起来像这样。

public class Author {
    private String name;
    private int relatedArticleId;

    // standard getters, setters & constructors
}

Next, we’ll implement a class to count authors while traversing a stream of authors. Then the class will perform a reduction on the stream.

接下来,我们将实现一个类,在遍历作者流的同时统计作者的数量。然后,该类将对该流进行还原

Let’s have a look at the class implementation:

让我们来看看这个类的实现。

public class RelatedAuthorCounter {
    private int counter;
    private boolean isRelated;
 
    // standard constructors/getters
 
    public RelatedAuthorCounter accumulate(Author author) {
        if (author.getRelatedArticleId() == 0) {
            return isRelated ? this : new RelatedAuthorCounter( counter, true);
        } else {
            return isRelated ? new RelatedAuthorCounter(counter + 1, false) : this;
        }
    }

    public RelatedAuthorCounter combine(RelatedAuthorCounter RelatedAuthorCounter) {
        return new RelatedAuthorCounter(
          counter + RelatedAuthorCounter.counter, 
          RelatedAuthorCounter.isRelated);
    }
}

Each method in the above class performs a specific operation to count while traversing.

上述类中的每个方法在遍历时都会执行特定的操作来计数。

First, the accumulate() method traverse the authors one by one in an iterative way, then combine() sums two counters using their values. Finally, the getCounter() returns the counter.

首先,accumulate()方法以迭代的方式逐一遍历作者,然后combine()使用两个计数器的值进行求和。最后,getCounter()返回计数器。

Now, to test what we’ve done so far. Let’s convert our article’s list of authors to a stream of authors:

现在,来测试一下我们到目前为止所做的事情。让我们把我们的文章的作者列表转换成作者流。

Stream<Author> stream = article.getListOfAuthors().stream();

And implement a countAuthor() method to perform the reduction on the stream using RelatedAuthorCounter:

并实现一个countAuthor()方法,使用RelatedAuthorCounter对流进行还原。

private int countAutors(Stream<Author> stream) {
    RelatedAuthorCounter wordCounter = stream.reduce(
      new RelatedAuthorCounter(0, true), 
      RelatedAuthorCounter::accumulate, 
      RelatedAuthorCounter::combine);
    return wordCounter.getCounter();
}

If we used a sequential stream the output will be as expected “count = 9”, however, the problem arises when we try to parallelize the operation.

如果我们使用一个顺序流,输出将是预期的“count = 9”,然而,当我们试图将操作并行化时,问题就出现了。

Let’s take a look at the following test case:

让我们看一下下面的测试案例。

@Test
void 
  givenAStreamOfAuthors_whenProcessedInParallel_countProducesWrongOutput() {
    assertThat(Executor.countAutors(stream.parallel())).isGreaterThan(9);
}

Apparently, something has gone wrong – splitting the stream at a random position caused an author to be counted twice.

显然,有些地方出了问题–在一个随机的位置上分割流导致一个作者被计算了两次。

4.2. How to Customize

4.2.如何定制

To solve this, we need to implement a Spliterator that splits authors only when related id and articleId matches. Here’s the implementation of our custom Spliterator:

为了解决这个问题,我们需要实现一个Spliterator,只有当相关的idarticleId匹配时才会将作者分开。下面是我们自定义Spliterator的实现。

public class RelatedAuthorSpliterator implements Spliterator<Author> {
    private final List<Author> list;
    AtomicInteger current = new AtomicInteger();
    // standard constructor/getters

    @Override
    public boolean tryAdvance(Consumer<? super Author> action) {
        action.accept(list.get(current.getAndIncrement()));
        return current.get() < list.size();
    }

    @Override
    public Spliterator<Author> trySplit() {
        int currentSize = list.size() - current.get();
        if (currentSize < 10) {
            return null;
        }
        for (int splitPos = currentSize / 2 + current.intValue();
          splitPos < list.size(); splitPos++) {
            if (list.get(splitPos).getRelatedArticleId() == 0) {
                Spliterator<Author> spliterator
                  = new RelatedAuthorSpliterator(
                  list.subList(current.get(), splitPos));
                current.set(splitPos);
                return spliterator;
            }
        }
        return null;
   }

   @Override
   public long estimateSize() {
       return list.size() - current.get();
   }
 
   @Override
   public int characteristics() {
       return CONCURRENT;
   }
}

Now applying countAuthors() method will give the correct output. The following code demonstrates that:

现在应用countAuthors()方法将得到正确的输出。下面的代码演示了这一点。

@Test
public void
  givenAStreamOfAuthors_whenProcessedInParallel_countProducesRightOutput() {
    Stream<Author> stream2 = StreamSupport.stream(spliterator, true);
 
    assertThat(Executor.countAutors(stream2.parallel())).isEqualTo(9);
}

Also, the custom Spliterator is created from a list of authors and traverses through it by holding the current position.

另外,自定义的Spliterator是从一个作者列表中创建的,并通过保持当前位置来遍历它。

Let’s discuss in more details the implementation of each method:

让我们更详细地讨论每个方法的实现。

  • tryAdvance passes authors to the Consumer at the current index position and increments its position
  • trySplit defines the splitting mechanism, in our case, the RelatedAuthorSpliterator is created when ids matched, and the splitting divides the list into two parts
  • estimatedSize – is the difference between the list size and the position of currently iterated author
  • characteristics – returns the Spliterator characteristics, in our case SIZED as the value returned by the estimatedSize() method is exact; moreover, CONCURRENT indicates that the source of this Spliterator may be safely modified by other threads

5. Support for Primitive Values

5.对原始值的支持

The Spliterator API supports primitive values including double, int and long.

Spliterator API支持包括doubleintlong的原始值。

The only difference between using a generic and a primitive dedicated Spliterator is the given Consumer and the type of the Spliterator.

使用通用的和原始的专用Spliterator之间的唯一区别是给定的ConsumerSpliterator的类型。

For example, when we need it for an int value we need to pass an intConsumer. Furthermore, here’s a list of primitive dedicated Spliterators:

例如,当我们需要它处理一个int值时,我们需要传递一个intConsumer。此外,这里有一个原始的专用Spliterators的列表。

  • OfPrimitive<T, T_CONS, T_SPLITR extends Spliterator.OfPrimitive<T, T_CONS, T_SPLITR>>: parent interface for other primitives
  • OfInt: A Spliterator specialized for int
  • OfDouble: A Spliterator dedicated for double
  • OfLong: A Spliterator dedicated for long

6. Conclusion

6.结论

In this article, we covered Java 8 Spliterator usage, methods, characteristics, splitting process, primitive support and how to customize it.

在这篇文章中,我们介绍了Java 8 Spliterator的用法、方法、特征、分割过程、基元支持以及如何定制它。

As always, the full implementation of this article can be found over on Github.

一如既往,本文的完整实现可以在Github上找到over