1. Overview

1.概述

In this tutorial, we’re going to explore the Caesar cipher, an encryption method that shifts letters of a message to produce another, less readable one.

在本教程中，我们将探讨凯撒密码，这是一种将信息中的字母移位以产生另一个不太容易读懂的信息的加密方法。

First of all, we’ll go through the ciphering method and see how to implement it in Java.

首先，我们来了解一下加密的方法，看看如何在Java中实现它。

Then, we’ll see how to decipher an encrypted message, provided we know the offset used to encrypt it.

然后，我们将看到如何破译一个加密的信息，只要我们知道用于加密的偏移量。

And finally, we’ll learn how to break such a cipher and thus retrieving the original message from the encrypted one without knowing the offset used.

最后，我们将学习如何破解这样的密码，从而在不知道使用的偏移量的情况下从加密的信息中检索出原始信息。

2. Caesar Cipher

2.凯撒密码

2.1. Explanation

2.1.解释

First of all, let’s define what a cipher is. A cipher is a method for encrypting a message, intending to make it less readable. As for the Caesar cipher, it’s a substitution cipher that transforms a message by shifting its letters by a given offset.

首先，让我们定义一下什么是密码。密码是一种对信息进行加密的方法，目的是使其不容易被读取。至于凯撒密码，它是一种替换密码，通过将其字母按给定的偏移量转移来改变信息。

Let’s say we want to shift the alphabet by 3, then letter A would be transformed to letter D, B to E, C to F, and so on.

假设我们想把字母表移3，那么字母A就会变成字母D，B变成E，C变成F，以此类推。

Here is the complete matching between original and transformed letters for an offset of 3:

下面是偏移量为3的原始字母和转换后的字母之间的完整匹配。

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
D E F G H I J K L M N O P Q R S T U V W X Y Z A B C

As we can see, once the transformation goes beyond the letter Z, we go back to the start of the alphabet, so that X, Y and Z are transformed into A, B and C, respectively.

我们可以看到，一旦转换超过了字母Z，我们就回到了字母表的起点，所以X、Y和Z分别被转换为A、B和C。

Therefore, if we choose an offset greater or equal to 26, we loop, at least one time, over the entire alphabet. Let’s imagine we shift a message by 28, that really means we’re shifting it by 2. Indeed, after shifting by 26, all letters are matching themselves.

因此，如果我们选择一个大于或等于26的偏移量，我们至少会在整个字母表上循环一次。让我们想象一下，我们将一条信息移位28，这实际上意味着我们将它移位2。的确，移位26后，所有字母都与自己相匹配。

Really, we can transform any offset into a simpler offset by performing a modulo 26 operation on it:

实际上，我们可以通过对其进行模数26操作，将任何偏移量转化为更简单的偏移量。

offset = offset % 26

2.2. Algorithm in Java

2.2.在Java中的算法

Now, let’s see how to implement the Caesar cipher in Java.

现在，让我们看看如何在Java中实现凯撒密码。

First, let’s create a class CaesarCipher that will hold a cipher() method taking a message and an offset as parameters:

首先，让我们创建一个CaesarCipher类，它将持有一个cipher()方法，以一个消息和一个偏移作为参数。

public class CaesarCipher {
    String cipher(String message, int offset) {}
}

That method will encrypt the message using the Caesar cipher.

该方法将使用凯撒密码对信息进行加密。

We’ll suppose here that offsets are positive and messages only contain lower case letters and spaces. Then, what we want is to shift all alphabetic characters by the given offset:

我们在这里假设偏移量是正数，而且信息只包含小写字母和空格。那么，我们要做的就是将所有的英文字母按给定的偏移量移位。

StringBuilder result = new StringBuilder();
for (char character : message.toCharArray()) {
    if (character != ' ') {
        int originalAlphabetPosition = character - 'a';
        int newAlphabetPosition = (originalAlphabetPosition + offset) % 26;
        char newCharacter = (char) ('a' + newAlphabetPosition);
        result.append(newCharacter);
    } else {
        result.append(character);
    }
}
return result;

As we can see, we rely on the ASCII codes of the alphabet letters to achieve our goal.

我们可以看到，我们依靠字母的ASCII码来实现我们的目标。

First, we compute the position of the current letter in the alphabet, and for that, we take its ASCII code and subtract the ASCII code of letter a from it. Then we apply the offset to this position, carefully using the modulo to remain in the alphabet range. And finally, we retrieve the new character by adding the new position to the ASCII code of letter a.

首先，我们计算出当前字母在字母表中的位置，为此，我们取其ASCII码，并从其中减去字母a的ASCII码。然后，我们将偏移量应用于这个位置，谨慎地使用模数以保持在字母表范围内。最后，我们将新的位置加上字母a的ASCII码，就得到了新的字符。

Now, let’s try this implementation on the message “he told me i could never teach a llama to drive” with an offset of 3:

现在，让我们在 “他告诉我，我永远不可能教一只骆驼开车 “的信息上试试这个实现，偏移量为3。

CaesarCipher cipher = new CaesarCipher();

String cipheredMessage = cipher.cipher("he told me i could never teach a llama to drive", 3);

assertThat(cipheredMessage)
  .isEqualTo("kh wrog ph l frxog qhyhu whdfk d oodpd wr gulyh");

As we can see, the ciphered message respects the matching defined earlier for an offset of 3.

正如我们所看到的，被加密的信息尊重了前面定义的偏移量为3的匹配。

Now, this particular example has the specificity not to exceed the letter z during the transformation, therefore not having to go back to the start of the alphabet. Thus, let’s try again with an offset of 10 so that some letters will be mapped to letters at the beginning of the alphabet, like t which will be mapped to d:

现在，这个特殊的例子具有特殊性，在转换过程中不超过字母z，因此不必再回到字母表的开头。因此，让我们再试一次，偏移量为10，这样一些字母将被映射到字母表开头的字母上，比如t将被映射到d。

String cipheredMessage = cipher.cipher("he told me i could never teach a llama to drive", 10);

assertThat(cipheredMessage)
  .isEqualTo("ro dyvn wo s myevn xofob dokmr k vvkwk dy nbsfo");

It works as expected, thanks to the modulo operation. That operation also takes care of larger offsets. Let’s say we want to use 36 as offset, which is equivalent to 10, the modulo operation ensures that the transformation will give the same result.

它如预期的那样工作，多亏了modulo操作。该操作还能处理更大的偏移量。比方说，我们想用36作为偏移量，这相当于10，取模操作确保了转换会得到相同的结果。

3. Decipher

3.解读

3.1. Explanation

3.1.解释

Now, let’s see how to decipher such a message when we know the offset used to encrypt it.

现在，让我们看看当我们知道用于加密的偏移量时，如何破译这样的信息。

As a matter of fact, deciphering a message encrypted with Caesar cipher can be seen as ciphering it with a negative offset, or also ciphering it with a complementary offset.

事实上，破译用凯撒密码加密的信息可以被看作是用负偏移量来破译，或者也可以用互补偏移量来破译。

So, let’s say we have a message encrypted with an offset of 3. Then, we can either encrypt it with an offset of -3 or encrypt it with an offset of 23. Either way, we retrieve the original message.

因此，假设我们有一个偏移量为3的加密信息，那么，我们可以用偏移量为-3来加密，或者用偏移量为23来加密。无论哪种方式，我们都可以检索到原始信息。

Unfortunately, our algorithm doesn’t handle negative offset out of the box. We’ll have problems converting letters looping back to the end of the alphabet (for example transforming the letter a into the letter z with an offset of -1). But, we can compute the complementary offset, which is positive, and then use our algorithm.

不幸的是，我们的算法并不能处理开箱即用的负偏移。我们在转换循环到字母表末端的字母时会遇到问题（例如，将字母a转换为偏移量为-1的字母z）。但是，我们可以计算出互补的偏移量，它是正的，然后再使用我们的算法。

So, how to obtain this complementary offset? The naïve way of doing this would be to subtract the original offset from 26. Of course, this will work for offsets between 0 and 26 but will give negative results otherwise.

那么，如何获得这个互补的偏移量呢？天真的方法是用26减去原来的偏移量。当然，这对0和26之间的偏移量是有效的，但在其他情况下会产生负面的结果。

That’s where we’ll make use of the modulo operator again, directly on the original offset, before doing the subtraction. That way, we ensure always returning a positive offset.

这时，我们将再次使用模运算符，在做减法之前，直接在原始偏移量上使用。这样一来，我们就能确保总是返回一个正的偏移。

3.2. Algorithm in Java

3.2.在Java中的算法

Let’s now implement it in Java. First, we’ll add a decipher() method to our class:

现在让我们用Java来实现它。首先，我们要给我们的类添加一个decipher()方法。

String decipher(String message, int offset) {}

Then, let’s call the cipher() method with our calculated complementary offset:

然后，让我们用我们计算出的互补偏移量调用cipher()方法。

return cipher(message, 26 - (offset % 26));

That’s it, our deciphering algorithm is set up. Let’s try it on the example with offset 36:

就这样，我们的破译算法已经设置好了。让我们在偏移量为36的例子上试试吧。

String decipheredSentence = cipher.decipher("ro dyvn wo s myevn xofob dokmr k vvkwk dy nbsfo", 36);
assertThat(decipheredSentence)
  .isEqualTo("he told me i could never teach a llama to drive");

As we can see, we retrieve our original message.

正如我们所见，我们找回了我们的原始信息。

4. Breaking the Ceasar Cipher

4.破解 “凯撒 “密码

4.1. Explanation

4.1.解释

Now that we’ve covered ciphering and deciphering messages using the Caesar cipher, we can dive into how to break it. That is, decipher a ciphered message without knowing the used offset at first.

现在，我们已经涵盖了使用凯撒密码对信息进行加密和解密，我们可以深入研究如何破解它。也就是说，在不知道所使用的偏移量的情况下，首先破译一个被加密的信息。

To do that, we’ll make use of the probabilities to find English letters in a text. The idea will be to decipher the message using offsets 0 to 25 and check what shift presents a letter distribution similar to that of English texts.

要做到这一点，我们将利用在文本中找到英文字母的概率。我们的想法是使用0到25的偏移量来破译信息，并检查什么偏移量呈现出与英文文本类似的字母分布。

In order to determine the similarity of two distributions, we’ll use the Chi-squared statistic.

为了确定两个分布的相似性，我们将使用Chi-squared统计学。。

The Chi-squared statistic will provide a number telling us whether two distributions are similar or not. The smaller the number, the more similar they are.

Chi-squared统计将提供一个数字，告诉我们两个分布是否相似。这个数字越小，它们就越相似。

So, we’ll compute the Chi-square for every offset and then return the one with the smallest Chi-square. This should give us the offset used to cipher the message.

因此，我们将计算每个偏移量的Chi-square，然后返回Chi-square最小的那个。这将给我们提供用于加密信息的偏移量。

However, we must keep in mind that this technique is not bulletproof and should the message be too short or using words unfortunately non-representative of a standard English text, it could return a wrong offset.

然而，我们必须记住，这种技术并不是无懈可击的，如果信息太短或使用的词语不幸不代表标准英语文本，它可能会返回一个错误的偏移量。

4.2. Define the Base Letters Distribution

4.2.定义基本信条分布

Let’s now see how to implement the breaking algorithm in Java.

现在让我们看看如何在Java中实现破译算法。

First of all, let’s create a breakCipher() method in our CaesarCipher class, which will return the offset used to encrypt a message:

首先，让我们在我们的CaesarCipher类中创建一个breakCipher()方法，它将返回用于加密信息的偏移量。

int breakCipher(String message) {}

Then, let’s define an array containing the probabilities to find a certain letter in an English text:

然后，让我们定义一个数组，包含在英文文本中找到某个字母的概率。

double[] englishLettersProbabilities = {0.073, 0.009, 0.030, 0.044, 0.130, 0.028, 0.016, 0.035, 0.074,
  0.002, 0.003, 0.035, 0.025, 0.078, 0.074, 0.027, 0.003,
  0.077, 0.063, 0.093, 0.027, 0.013, 0.016, 0.005, 0.019, 0.001};

From this array, we’ll be able to calculate the expected frequencies of the letters in a given message, by multiplying the probabilities by the length of the message:

从这个数组中，我们将能够通过将概率乘以信息的长度，计算出给定信息中字母的预期频率。

double[] expectedLettersFrequencies = Arrays.stream(englishLettersProbabilities)
  .map(probability -> probability * message.getLength())
  .toArray();

For example, in a message of length 100, we should expect the letter a to appear 7.3 times, and the letter e to appear 13 times.

例如，在长度为100的信息中，我们应该期望字母a出现7.3次，而字母e出现13次。

4.3. Calculate the Chi-squares

4.3.计算Chi-squares

Now, we’re going to calculate the Chi-squares of deciphered message letters distribution and standard English letters distribution.

现在，我们要计算破译的信息字母分布和标准英语字母分布的Chi-squares。

To achieve that, we’ll need to import the Apache Commons Math3 library that contains a utility class to compute Chi-squares:

为了实现这一目标，我们需要导入Apache Commons Math3库，该库包含一个用于计算Chi-squares的实用类。

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-math3</artifactId>
    <version>3.6.1</version>
</dependency>

What we need to do now is to create an array that’ll contain the calculated Chi-squares for each offset between 0 and 25.

我们现在需要做的是创建一个数组，该数组将包含0到25之间每个偏移量的计算Chi-quares。

Thus, we’ll decipher the encrypted message using each offset, and then count the letters in that message.

因此，我们将使用每个偏移量来破译加密信息，然后计算该信息中的字母。

Finally, we’ll use the ChiSquareTest#chiSquare method to calculate the Chi-square between the expected and observed letters distribution:

最后，我们将使用ChiSquareTest#chiSquare方法来计算预期和观察到的字母分布之间的Chi-square。

double[] chiSquares = new double[26];

for (int offset = 0; offset < chiSquares.length; offset++) {
    String decipheredMessage = decipher(message, offset);
    long[] lettersFrequencies = observedLettersFrequencies(decipheredMessage);
    double chiSquare = new ChiSquareTest().chiSquare(expectedLettersFrequencies, lettersFrequencies);
    chiSquares[offset] = chiSquare;
}

return chiSquares;

The observedLettersFrequencies() method simply realizes a count of letters a to z in the passed message:

observedLettersFrequencies()方法简单地实现了对传递的消息中字母a到z的计数。

long[] observedLettersFrequencies(String message) {
    return IntStream.rangeClosed('a', 'z')
      .mapToLong(letter -> countLetter((char) letter, message))
      .toArray();
}

long countLetter(char letter, String message) {
    return message.chars()
      .filter(character -> character == letter)
      .count();
}

4.4. Find the Most Probable Offset

4.4.寻找最可能的偏移

Once all the Chi-squares calculated, we can return the offset matching the smallest Chi-square:

一旦所有的Chi-square计算出来，我们就可以返回与最小的Chi-square匹配的偏移量。

int probableOffset = 0;
for (int offset = 0; offset < chiSquares.length; offset++) {
    <span class="x x-first">log</span><span class="pl-k x">.</span><span class="x x-last">debug</span>(String.format("Chi-Square for offset %d: %.2f", offset, chiSquares[offset]));
    if (chiSquares[offset] < chiSquares[probableOffset]) {
        probableOffset = offset;
    }
}

return probableOffset;

Although it’s not necessary to enter the loop with offset 0 as we consider it to be the minimum before starting the loop, we do it to print its Chi-square value.

虽然没有必要用偏移量0进入循环，因为我们在开始循环前认为它是最小值，但我们这样做是为了打印它的Chi-square值。

Let’s try this algorithm on the message encrypted using offset 10:

让我们在使用偏移量10加密的信息上试试这个算法。

int offset = algorithm.breakCipher("ro dyvn wo s myevn xofob dokmr k vvkwk dy nbsfo");
assertThat(offset).isEqualTo(10);

assertThat(algorithm.decipher("ro dyvn wo s myevn xofob dokmr k vvkwk dy nbsfo", offset))
  .isEqualTo("he told me i could never teach a llama to drive");

As we can see, the method retrieves the correct offset, which can then be used to decipher the message an retrieve the original one.

我们可以看到，该方法检索到了正确的偏移量，然后可以用来破译信息并检索到原始信息。

Here are the different Chi-squares calculated for this particular break:

下面是为这个特定的休息时间计算的不同的Chi-quares。

Chi-Square for offset 0: 210.69
Chi-Square for offset 1: 327.65
Chi-Square for offset 2: 255.22
Chi-Square for offset 3: 187.12
Chi-Square for offset 4: 734.16
Chi-Square for offset 5: 673.68
Chi-Square for offset 6: 223.35
Chi-Square for offset 7: 111.13
Chi-Square for offset 8: 270.11
Chi-Square for offset 9: 153.26
Chi-Square for offset 10: 23.74
Chi-Square for offset 11: 643.14
Chi-Square for offset 12: 328.83
Chi-Square for offset 13: 434.19
Chi-Square for offset 14: 384.80
Chi-Square for offset 15: 1206.47
Chi-Square for offset 16: 138.08
Chi-Square for offset 17: 262.66
Chi-Square for offset 18: 253.28
Chi-Square for offset 19: 280.83
Chi-Square for offset 20: 365.77
Chi-Square for offset 21: 107.08
Chi-Square for offset 22: 548.81
Chi-Square for offset 23: 255.12
Chi-Square for offset 24: 458.72
Chi-Square for offset 25: 325.45

As we can see, the one for offset 10 is clearly smaller than the others.

正如我们所看到的，10号偏移量明显比其他的小。

5. Conclusion

5.总结

In this article, we covered the Caesar cipher. We learned how to cipher and decipher a message by shifting its letters by a given offset. We also learned how to break the cipher. And we saw all the Java implementations that allow us to do that.

在这篇文章中，我们介绍了凯撒密码。我们学习了如何通过给定的偏移量对信息的字母进行加密和解密。我们还学习了如何破解该密码。我们还看到了允许我们这样做的所有Java实现。

The code of this article can be found over on GitHub.

这篇文章的代码可以在GitHub上找到over。