Branch Prediction in Java – Java中的分支预测

最后修改: 2020年 1月 1日

中文/混合/英文(键盘快捷键:t)

1. Introduction

1.介绍

Branch Prediction is an interesting concept in computer science and can have a profound impact on the performance of our applications. Yet it’s generally not well understood and most developers pay very little attention to it.

分支预测是计算机科学中一个有趣的概念,可以对我们的应用程序的性能产生深远的影响。然而,它通常没有被很好地理解,大多数开发人员很少关注它。

In this article, we are going to explore exactly what it is, how it affects our software, and what we can do about it.

在这篇文章中,我们将探讨它到底是什么,它如何影响我们的软件,以及我们可以做些什么。

2. What Are Instruction Pipelines?

2.什么是教学管道?

When we write any computer program, we are writing a set of commands that we expect the computer to execute in sequence.

当我们编写任何计算机程序时,我们是在编写一组命令,我们希望计算机能够依次执行。

Early computers would run these one at a time. This means that each command gets loaded into memory, executed in its entirety, and only when it’s completed will the next one get loaded.

早期的计算机会一个一个地运行这些命令。这意味着每条命令都会被加载到内存中,完整地执行,只有当它完成后才会加载下一条。

Instruction Pipelines are an improvement over this. They allow the processor to split the work into pieces and then perform different parts in parallel. This would then allow the processor to execute one command while loading the next, ready to go.

指令流水线是在此基础上的一个改进。它们允许处理器将工作分成几部分,然后并行地执行不同的部分。这样一来,处理器就可以在执行一条指令的同时加载下一条指令,准备就绪。

Longer pipelines inside the processor not only allow each part to be simplified but also allow more parts of it to be performed in parallel. This can improve the overall performance of the system.

处理器内部较长的管道不仅允许每个部分被简化,而且允许其中更多的部分被并行执行。这可以提高系统的整体性能。

For example, we could have a simple program:

例如,我们可以有一个简单的程序。

int a = 0;
a += 1;
a += 2;
a += 3;

This might be processed by a pipeline comprising of Fetch, Decode, Execute, Store segments as:

这可能会被一个由获取、解码、执行、存储等环节组成的流水线处理。

We can see here how the overall execution of the four commands is run in parallel, thus making the entire sequence faster.

我们在这里可以看到,四个命令的整体执行是平行运行的,从而使整个序列更快。

3. What Are the Hazards?

3.有哪些危害?

Certain the commands that the processor needs to execute will cause problems for the pipelining. These are any commands where the execution of one part of the pipeline is dependent on earlier parts, but where those earlier parts might not yet have been executed.

处理器需要执行的某些命令会给流水线带来问题。这些是任何命令,其中流水线的一个部分的执行取决于早期的部分,但这些早期的部分可能还没有被执行。

Branches are a specific form of hazard. They cause the execution to go in one of two directions, and it isn’t possible to know which direction until the branch is resolved. This means that any attempt to load the commands past the branch isn’t safe because we have no way of knowing where to load them from.

分支是危险的一种特殊形式。它们会导致执行向两个方向中的一个方向发展,而在分支解决之前,我们不可能知道是哪个方向。这意味着任何试图在分支之后加载命令的行为都是不安全的,因为我们没有办法知道从哪里加载它们。

Let’s change our simple program to introduce a branch:

让我们改变我们的简单程序,引入一个分支。

int a = 0;
a += 1;
if (a < 10) {
  a += 2;
}
a += 3;

The result of this is the same as before, but we’ve introduced an if statement in the middle of it. The computer will see this and won’t be able to load commands past this until it’s been resolved. As such, the flow will look something like:

这样做的结果和之前一样,但我们在中间引入了一个if语句。计算机将看到这一点,在它被解决之前,将无法加载超过这一点的命令。因此,流程看起来会是这样的。

We can immediately see the impact that this has on the execution of our program, and how many clock steps it took to execute the same result.

我们可以立即看到这对我们的程序执行产生的影响,以及执行同样的结果需要多少个时钟步骤。

4. What Is Branch Prediction?

4.什么是分支预测?

Branch Prediction is an enhancement to the above, where our computer will attempt to predict which way a branch is going to go and then act accordingly.

分支预测是对上述内容的加强,我们的计算机将试图预测一个分支的走向,然后采取相应行动。

In our above example, the processor might predict that if (a < 10) is likely to be true, and so it will act as if the instruction a += 2 was the next one to execute. This would then cause the flow to look something like:

在我们上面的例子中,处理器可能预测if (a < 10)可能是true,所以它将会像指令a += 2一样执行。这将导致流程看起来像这样。

We can see straight away that this has improved the performance of our program – it’s now taking nine ticks and not 11, so it’s 19% faster.

我们可以直接看到,这提高了我们程序的性能–它现在需要9次点击,而不是11次,所以它的速度提高了19%。

This is not without risk, though. If the branch prediction gets it wrong, then it will start to queue up instructions that shouldn’t be performed. If this happens, then the computer will need to throw them away and start over.

不过,这并非没有风险。如果分支预测出错,那么它将开始排队等待不应该执行的指令。如果发生这种情况,那么计算机将需要把它们扔掉并重新开始。

Let’s turn our conditional around so that it’s now false:

让我们把我们的条件转过来,让它现在变成false

int a = 0;
a += 1;
if (a > 10) {
  a += 2;
}
a += 3;

This might execute something like:

这可能会执行类似的内容。

This is now slower than the earlier flow, even though we’re doing less! The processor incorrectly predicted that the branch would evaluate to true, started queueing up the a += 2 instruction, and then had to discard it and start over when the branch evaluated to false.

这现在比之前的流程慢了,尽管我们做的事情更少!处理器错误地预测了分支会评估为true,开始排队等待a += 2指令,然后在分支评估为false时不得不丢弃它并重新开始。

5. Real Impact on Code

5.对代码的实际影响

Now that we know what branch prediction is and what the benefits are, how can it affect us? After all, we’re talking about losing a few processor cycles on high-speed computers, so surely it won’t be noticeable.

现在我们知道了什么是分支预测,以及它的好处是什么,它能对我们产生什么影响?毕竟,我们谈论的是在高速计算机上损失几个处理器周期,所以肯定不会被注意到。

And sometimes that’s true. But sometimes it can make a surprising difference to the performance of our applications. It depends a lot on exactly what we’re doing. Specifically, it depends on how much we’re doing in a short time.

有时这也是事实。但有时它也会对我们的应用程序的性能产生惊人的影响。这在很大程度上取决于我们到底在做什么。具体而言,这取决于我们在短时间内做多少事情。

5.1. Counting List Entries

5.1.计算列表中的条目

Let’s try counting entries in a list. We’re going to generate a list of numbers, then count how many of them are less than a certain cutoff. That’s very similar to the above examples, but we’re doing it in a loop instead of just as a single instruction:

让我们试着计算一个列表中的条目。我们将生成一个数字列表,然后计算其中有多少个数字小于某个截止点。这与上面的例子非常相似,但我们是在一个循环中进行,而不是仅仅作为一条指令。

List<Long> numbers = LongStream.range(0, top)
    .boxed()
    .collect(Collectors.toList());

if (shuffle) {
    Collections.shuffle(numbers);
}

long cutoff = top / 2;
long count = 0;

long start = System.currentTimeMillis();
for (Long number : numbers) {
    if (number < cutoff) {
        ++count;
    }
}
long end = System.currentTimeMillis();

LOG.info("Counted {}/{} {} numbers in {}ms",
    count, top, shuffle ? "shuffled" : "sorted", end - start);

Note that we’re only timing the loop that does the counting because this is what we’re interested in. So, how long does this take?

请注意,我们只对进行计数的循环进行计时,因为这正是我们感兴趣的地方。那么,这需要多长时间呢?

If we’re generating sufficiently small lists, then the code runs so fast that it can’t be timed — a list of size 100,000 still shows a time of 0ms. However, when the list gets large enough that we can time it, we can see a significant difference based on whether we have shuffled the list or not. For a list of 10,000,000 numbers:

如果我们生成的是足够小的列表,那么代码的运行速度就会快到无法计时–一个大小为100,000的列表仍然显示时间为0ms。然而,当列表变得足够大,我们可以计时时,我们可以看到基于我们是否对列表进行了洗牌的显著差异。对于一个10,000,000个数字的列表。

  • Sorted – 44ms
  • Shuffled – 221ms

That is, the shuffled list takes 5x longer to count than the sorted list, even though the actual numbers being counted are the same.

也就是说,洗牌后的列表比排序后的列表需要花费5倍的时间来计数,尽管实际被计数的数字是一样的。

However, the act of sorting the list is significantly more expensive than just performing the counting. We should always profile our code and determine if any performance gains are beneficial.

然而,对列表进行排序的行为要比仅仅执行计数要昂贵得多。我们应该始终对我们的代码进行剖析,并确定任何性能提升是否有益。

5.2. Order of Branches

5.2.分支机构的顺序

Following the above, it seems reasonable that the order of branches in an if/else statement should be important. That is, we could expect the following to perform better than if we re-ordered the branches:

根据上述情况,if/else语句中的分支顺序似乎很重要。也就是说,我们可以期望以下的表现比我们重新排列分支的顺序要好。

if (mostLikely) {
  // Do something
} else if (lessLikely) {
  // Do something
} else if (leastLikely) {
  // Do something
}

However, modern computers can avoid this issue by using the branch prediction cache. Indeed, we can test this as well:

然而,现代计算机可以通过使用分支预测缓存来避免这个问题。事实上,我们也可以对此进行测试。

List<Long> numbers = LongStream.range(0, top)
  .boxed()
  .collect(Collectors.toList());
if (shuffle) {
    Collections.shuffle(numbers);
}

long cutoff = (long)(top * cutoffPercentage);
long low = 0;
long high = 0;

long start = System.currentTimeMillis();
for (Long number : numbers) {
    if (number < cutoff) {
        ++low;
    } else {
        ++high;
    }
}
long end = System.currentTimeMillis();

LOG.info("Counted {}/{} numbers in {}ms", low, high, end - start);

This code executes in around the same time – ~35ms for sorted numbers, ~200ms for shuffled numbers – when counting 10,000,000 numbers, irrespective of the value of cutoffPercentage.

无论cutoffPercentage的值是多少,在计算10,000,000个数字时,这段代码的执行时间都是一样的–排序后的数字为~35ms,洗牌后的数字为~200ms。

This is because the branch predictor is handling both branches equally and correctly guessing which way we’re going to go for them.

这是因为分支预测器正在平等地处理这两个分支,并正确地猜测我们要为它们走哪条路。

5.3. Combining Conditions

5.3.合并条件

What if we have a choice between one or two conditions? It might be possible to re-write our logic in a different way that has the same behavior, but should we do this?

如果我们要在一个或两个条件中进行选择怎么办?也许可以用不同的方式重新编写我们的逻辑,使其具有相同的行为,但是我们应该这样做吗?

As an example, if we are comparing two numbers to 0, an alternative approach is to multiply them together and compare the result to 0. This is then replacing a condition with a multiplication. But is this worthwhile?

举例来说,如果我们要将两个数字与0相比较,另一种方法是将它们相乘,然后将结果与0相比较,这就是用乘法代替条件。但这值得吗?

Let’s consider an example:

让我们考虑一个例子。

long[] first = LongStream.range(0, TOP)
  .map(n -> Math.random() < FRACTION ? 0 : n)
  .toArray();
long[] second = LongStream.range(0, TOP)
  .map(n -> Math.random() < FRACTION ? 0 : n)
  .toArray();

long count = 0;
long start = System.currentTimeMillis();
for (int i = 0; i < TOP; i++) {
    if (first[i] != 0 && second[i] != 0) {
        ++count;
    }
}
long end = System.currentTimeMillis();

LOG.info("Counted {}/{} numbers using separate mode in {}ms", count, TOP, end - start);

Our condition inside the loop can be replaced, as described above. Doing so actually does affect the runtime:

我们在循环中的条件可以被替换,如上所述。这样做实际上确实会影响运行时间。

  • Separate conditions – 40ms
  • Multiple and single condition – 22ms

So the option that uses two different conditions actually takes twice as long to execute.

因此,使用两个不同条件的选项实际上需要两倍的时间来执行。

6. Conclusion

6.结论

We’ve seen what branch prediction is and how it can have an impact on our programs. This can give us some additional tools in our belt to ensure that our programs are as efficient as possible.

我们已经看到什么是分支预测,以及它如何对我们的项目产生影响。这可以给我们带来一些额外的工具,以确保我们的程序尽可能地高效。

However, as is always the case, we need to remember to profile our code before making major changes. It can sometimes be the case that making changes to help branch prediction costs more in other ways.

然而,和以往一样,我们需要记得在进行重大修改之前对我们的代码进行剖析。有时会出现这样的情况:为帮助分支预测而进行的修改会在其他方面花费更多。

Examples of the cases from this article are available over on GitHub.

本文中的案例可在GitHub上找到over