1. Introduction
1.导言
In this tutorial, we’ll dive deep into a Java Streams vs. For-Loops comparison. These tools play a vital role in data processing for every Java developer. Although they are different in many ways, as we’ll see in the rest of the article, they have very similar use cases and can be easily interchangeable many times.
在本教程中,我们将深入探讨 Java Streams 与 For-Loops 的比较。对于每个 Java 开发人员来说,这些工具在数据处理中都扮演着重要角色。虽然它们在很多方面都不相同,但正如我们在本文接下来的内容中所看到的,它们的用例非常相似,可以很容易地互换多次。
Streams, introduced in Java 8, offer a functional and declarative approach, while for-loops provide a traditional imperative method. By the end of the article, we can make the most suitable decision for our programming tasks.
Streams 在 Java 8 中引入,提供了一种函数式和声明式方法,而 for-loops 则提供了一种传统的命令式方法。在本文结束时,我们可以为我们的编程任务做出最合适的决定。
2. Performance
2.性能
When it comes to comparing solutions to a particular programming problem, we often have to talk about performance. Also, this case is no different. Since both streams and for-loops are used to process large amounts of data, performance can be important in choosing the right solution.
在比较特定编程问题的解决方案时,我们经常不得不谈论性能。本例也不例外。由于流和 for 循环都用于处理大量数据,因此性能对于选择正确的解决方案非常重要。
Let’s walk through a comprehensive benchmarking example to understand the performance differences between for-loops and streams. We’ll compare the execution times of complex operations involving filtering, mapping, and summing using both for-loops and streams. For this purpose, we’ll use the Java Microbenchmarking Harness (JMH), a tool designed specifically for benchmarking Java code.
让我们通过一个综合基准测试示例来了解 for 循环和流之间的性能差异。我们将比较使用 for-loop 和流进行过滤、映射和求和等复杂操作的执行时间。为此,我们将使用 Java Microbenchmarking Harness (JMH),这是一款专为 Java 代码基准测试而设计的工具。
2.1. Getting Started
2.1.入门
We start by defining the dependencies:
我们首先要定义依赖关系:
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>1.37</version>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>1.37</version>
</dependency>
We can always find the latest versions of JMH Core and JMH Annotation Processor on Maven Central.
我们总能在 Maven Central 上找到 JMH Core 和 JMH Annotation Processor 的最新版本。
2.2. Setting Up the Benchmark
2.2.设置基准
In our benchmark, we’ll create a scenario with a list of integers ranging from 0 to 999,999. We want to filter out even numbers, square them, and then calculate their sum. Besides that, to ensure fairness, we’ll first implement this process using a traditional for-loop:
在我们的基准测试中,我们将创建一个包含 0 到 999,999 个整数的场景。我们要过滤出偶数,将它们平方,然后计算它们的和。此外,为了确保公平性,我们将首先使用传统的 for 循环来实现这一过程:
@State(Scope.Thread)
public static class MyState {
List<Integer> numbers;
@Setup(Level.Trial)
public void setUp() {
numbers = new ArrayList<>();
for (int i = 0; i < 1_000_000; i++) {
numbers.add(i);
}
}
}
This State class will be passed to our benchmark. Also, the Setup will run before each of them.
这个 State 类将传递给我们的基准。此外,Setup 也将在它们之前运行。
2.3. Benchmarking with For-Loops
2.3.使用 For 循环进行基准测试
Our for-loop implementation involves iterating through the list of numbers, checking for evenness, squaring them, and accumulating the sum in a variable:
我们的 for 循环实现过程包括遍历数字列表、检查偶数、平方,然后将总和累加到一个变量中:
@Benchmark
public int forLoopBenchmark(MyState state) {
int sum = 0;
for (int number : state.numbers) {
if (number % 2 == 0) {
sum = sum + (number * number);
}
}
return sum;
}
2.4. Benchmarking with Streams
2.4.使用流进行基准测试
Next, we’ll implement the same complex operations using Java streams. Moreover, we’ll begin by filtering the even numbers, mapping them to their squares, and ultimately calculating the sum:
接下来,我们将使用 Java 流实现同样复杂的操作。此外,我们将从过滤偶数开始,将它们映射到它们的平方,并最终计算出总和:
@Benchmark
public int streamBenchMark(MyState state) {
return state.numbers.stream()
.filter(number -> number % 2 == 0)
.map(number -> number * number)
.reduce(0, Integer::sum);
}
We use the terminal operations reduce() to compute the sum of the numbers. Also, we can calculate the sum in multiple ways.
我们使用终端操作 reduce() 计算数字之和。此外,我们可以用多种方法计算总和。
2.5. Running the Benchmark
2.5.运行基准测试
With our benchmark methods in place, we’ll run the benchmark using JMH. We’ll execute the benchmark multiple times to ensure accurate results and measure the average execution time. To do this, we’ll add the following annotations to our class:
基准测试方法就绪后,我们将使用 JMH 运行基准测试。我们将多次执行基准测试,以确保结果准确,并测量平均执行时间。为此,我们将在类中添加以下注解:
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
With these additions, we ensure the result would be more accurate, running the benchmark five times, after three warmups and computing the average of all five iterations. Now, we can run the main method to see the results:
有了这些补充,我们可以确保结果更加准确,在三次热身后运行基准测试五次,并计算所有五次迭代的平均值。现在,我们可以运行主方法查看结果了:
public static void main(String[] args) throws RunnerException {
Options options = new OptionsBuilder()
.include(PerformanceBenchmark.class.getSimpleName())
.build();
new Runner(options).run();
}
2.6. Analyzing the Results
2.6.分析结果
Once we run the benchmark, JMH will provide us with average execution times for both the for-loop and stream implementations:
运行基准测试后,JMH 将为我们提供 for 循环和流实现的平均执行时间:
Benchmark Mode Cnt Score Error Units
PerformanceBenchmark.forLoopBenchmark avgt 5 3386660.051 ± 1375112.505 ns/op
PerformanceBenchmark.streamBenchMark avgt 5 12231480.518 ± 1609933.324 ns/op
We can see that in our example, the for-loops performed much better than the streams from the performance perspective. Even though streams performed worse than for-loops in this example, this could change in some cases, especially with parallel streams.
我们可以看到,在我们的示例中,从性能角度来看,for-loops 比流的性能要好得多。尽管在本例中流的性能比 for-loop 差,但在某些情况下,尤其是在使用 并行流时,这种情况可能会发生变化。
3. Syntax and Readability
3.语法和可读性
As programmers, the readability of our code plays an important role. Because of this, this aspect becomes an important one when we try to choose the best solution for our problem.
作为程序员,代码的可读性起着重要作用。因此,当我们试图为问题选择最佳解决方案时,代码的可读性就成了一个重要的方面。
First and foremost, let’s dive into the syntax and readability of streams. Streams promote a more concise and expressive style of coding. This is evident when filtering and mapping data:
首先,让我们深入了解一下流的语法和可读性。流促进了一种更简洁、更具表现力的编码风格:
List<String> fruits = Arrays.asList("apple", "banana", "orange", "grape");
long count = fruits.stream()
.filter(fruit -> fruit.length() > 5)
.count();
The stream code reads like a fluent sequence of operations, with the filtering condition and the count operation clearly expressed in a single, fluid chain. Furthermore, streams often result in code that’s easier to read due to their declarative nature. The code focuses more on what needs to be done than how to do it.
流代码读起来就像一个流畅的操作序列,过滤条件和计数操作在一个流畅的链条中清晰地表达出来。此外,由于其声明性,流代码通常更容易阅读。代码更关注需要做什么,而不是如何做。
In contrast, let’s explore the syntax and readability of for-loops. for-loops provide a more traditional and imperative style of coding:
相比之下,让我们来探讨一下 for-loops 的语法和可读性。for-loops 提供了一种更为传统和命令式的编码方式:
List<String> fruits = Arrays.asList("apple", "banana", "orange", "grape");
long count = 0;
for (String fruit : fruits) {
if (fruit.length() > 5) {
count++;
}
}
Here, the code involves explicit iteration and conditional statements. While this approach is well-understood by most developers, it can sometimes lead to more verbose code, making it potentially harder to read, especially for complex operations.
在这里,代码涉及明确的迭代和条件语句。虽然大多数开发人员都能很好地理解这种方法,但它有时会导致代码更加冗长,从而增加阅读难度,尤其是复杂的操作。
4. Parallelism and Concurrency
4 并行性和并发性
Parallelism and concurrency are crucial aspects to consider when comparing streams and for-loops in Java. Both approaches offer different capabilities and challenges when utilizing multi-core processors and managing concurrent operations.
在比较 Java 中的流和 for 循环时,并行性和并发性是需要考虑的重要方面。在利用多核处理器和管理并发操作时,这两种方法提供了不同的功能和挑战。
Streams are designed to make parallel processing more accessible. Java 8 introduced the concept of parallel streams, which automatically leverage multi-core processors to speed up data processing. We can easily rewrite the benchmark from the previous point to compute the sum concurrently:
流旨在使并行处理更易于访问。Java 8 引入了并行流的概念,它可以自动利用多核处理器来加快数据处理速度。我们可以轻松重写上一点中的基准,并发计算总和:
@Benchmark
public int parallelStreamBenchMark(MyState state) {
return state.numbers.parallelStream()
.filter(number -> number % 2 == 0)
.map(number -> number * number)
.reduce(0, Integer::sum);
}
The only thing needed to parallelize the process is to replace stream() with parallelStream() method. On the other side, rewriting the for-loop to compute the sum of numbers in parallel is more complicated:
要实现并行处理,只需将 stream() 替换为 parallelStream() 方法即可。另一方面,重写 for 循环以并行计算数字之和则更为复杂:
@Benchmark
public int concurrentForLoopBenchmark(MyState state) throws InterruptedException, ExecutionException {
int numThreads = Runtime.getRuntime().availableProcessors();
ExecutorService executorService = Executors.newFixedThreadPool(numThreads);
List<Callable<Integer>> tasks = new ArrayList<>();
int chunkSize = state.numbers.size() / numThreads;
for (int i = 0; i < numThreads; i++) {
final int start = i * chunkSize;
final int end = (i == numThreads - 1) ? state.numbers.size() : (i + 1) * chunkSize;
tasks.add(() -> {
int sum = 0;
for (int j = start; j < end; j++) {
int number = state.numbers.get(j);
if (number % 2 == 0) {
sum = sum + (number * number);
}
}
return sum;
});
}
int totalSum = 0;
for (Future<Integer> result : executorService.invokeAll(tasks)) {
totalSum += result.get();
}
executorService.shutdown();
return totalSum;
}
We can use Java’s concurrency utilities, such as ExecutorService, to execute tasks concurrently. We divide the list into chunks and process them concurrently using a thread pool. When deciding between streams and for-loops for parallelism and concurrency, we should consider the complexity of our task. Streams offer a more straightforward way to enable parallel processing for tasks that can be parallelized easily. On the other hand, for-loops, with manual concurrency control, are suitable for more complex scenarios that require custom thread management and coordination.
我们可以使用 Java 的并发实用程序(如 ExecutorService )来并发执行任务。我们将列表分成若干块,然后使用线程池并发处理它们。在决定使用流还是 for 循环来实现并行性和并发性时,我们应考虑任务的复杂性。对于易于并行化的任务,流提供了一种更直接的并行处理方式。另一方面,具有手动并发控制功能的 for-loops 适用于需要自定义线程管理和协调的更复杂场景。
5. Mutability
5.可变性
Now, let’s explore the aspect of mutability and how it differs between streams and for-loops. Understanding how these handle mutable data is essential for making informed choices.
现在,让我们来探讨可变性方面的问题,以及流和 for 循环之间的区别。了解它们如何处理可变数据对于做出明智的选择至关重要。
First and foremost, we need to recognize that streams, by their nature, promote immutability. In the context of streams, elements within a collection are not modified directly. Instead, operations on the stream create new streams or collections as intermediate results:
首先,我们需要认识到流的本质是促进不变性。在流的上下文中,集合中的元素不会被直接修改。相反,对流进行的操作会创建新的流或集合作为中间结果:
List<String> fruits = new ArrayList<>(Arrays.asList("apple", "banana", "orange"));
List<String> upperCaseFruits = fruits.stream()
.map(fruit -> fruit.toUpperCase())
.collect(Collectors.toList());
In this stream operation, the original list remains unchanged. The map() operation produces a new stream where each fruit is transformed to uppercase, and the collect() operation collects these transformed elements into a new list.
在这个流操作中,原始列表保持不变。map()操作会产生一个新的流,其中每个果实都被转换为大写字母,而collect()操作会将这些转换后的元素收集到一个新的列表中。
Contrastingly, for-loops can operate on mutable data structures directly:
相反,for-loops 可以直接对可变数据结构进行操作:
List<String> fruits = new ArrayList<>(Arrays.asList("apple", "banana", "orange"));
for (int i = 0; i < fruits.size(); i++) {
fruits.set(i, fruits.get(i).toUpperCase());
}
In this for-loop, we directly modify the original list, replacing every element with its uppercase correspondent. This can be advantageous when we need to modify existing data in place, but it also necessitates careful handling to avoid unintended consequences.
在这个 for 循环中,我们直接修改原始列表,将每个元素替换为对应的大写字母。当我们需要就地修改现有数据时,这可能会很有利,但也需要小心处理,以避免意外后果。
6. Conclusion
6.结论
Both streams and loops have their strengths and weaknesses. Streams offer a more functional and declarative approach, enhancing code readability and often leading to concise and elegant solutions. On the other hand, loops provide a familiar and explicit control structure, making them suitable for scenarios where precise execution order or mutability control is critical.
流和循环各有优缺点。流提供了一种功能性更强、声明性更强的方法,提高了代码的可读性,往往能带来简洁、优雅的解决方案。另一方面,循环提供了一种熟悉而明确的控制结构,使其适用于对精确执行顺序或可变性控制要求较高的情况。
The complete source code and all code snippets for this article are over on GitHub.