Checking If a String Is a Repeated Substring – 检查一个字符串是否是一个重复的子串

最后修改: 2019年 7月 3日

中文/混合/英文(键盘快捷键:t)

1. Introduction

1.介绍

In this tutorial, we’ll show how we can check in Java if a String is a sequence of repeated substrings.

在本教程中,我们将展示如何在Java中检查一个字符串是否是一个重复的子字符串的序列。

2. The Problem

2.问题

Before we continue with the implementation, let’s set up some conditions. First, we’ll assume that our String has at least two characters.

在我们继续实施之前,让我们设置一些条件。首先,我们假设我们的String至少有两个字符。

Second, there’s at least one repetition of a substring.

第二,至少有一个子串的重复。

This is best illustrated with some examples by checking out a few repeated substrings:

这一点最好通过一些例子来说明,即查看一些重复的子字符串。

"aa"
"ababab"
"barrybarrybarry"

And a few non-repeated ones:

还有一些不重复的。

"aba"
"cbacbac"
"carlosxcarlosy"

We’ll now show a few solutions to the problem.

现在我们将展示一些解决这个问题的方法。

3. A Naive Solution

3.一个天真的解决方案

Let’s implement the first solution.

让我们来实施第一个解决方案。

The process is rather simple: we’ll check the String‘s length and eliminate the single character Strings at the very beginning.

这个过程相当简单:我们将检查String的长度并消除最开始的单字符String

Then, since the length of a substring can’t be larger than a half of the string’s length, we’ll iterate through the half of the String and create the substring in every iteration by appending the next character to the previous substring.

然后,由于子串的长度不能大于字符串长度的一半,我们将遍历String的一半,并在每次遍历中通过将下一个字符追加到上一个子串中来创建子串。

We’ll next remove those substrings from the original String and check if the length of the “stripped” one is zero. That would mean that it’s made only of its substrings:

接下来我们将从原始String中移除这些子串,并检查 “剥离 “后的长度是否为零。这将意味着它只由其子串组成。

public static boolean containsOnlySubstrings(String string) {

    if (string.length() < 2) {
        return false;
    }

    StringBuilder substr = new StringBuilder();
    for (int i = 0; i < string.length() / 2; i++) {
        substr.append(string.charAt(i));

        String clearedFromSubstrings 
          = string.replaceAll(substr.toString(), "");

        if (clearedFromSubstrings.length() == 0) {
            return true;
        }
    }

    return false;
}

Let’s create some Strings to test our method:

让我们创建一些Strings来测试我们的方法。

String validString = "aa";
String validStringTwo = "ababab";
String validStringThree = "baeldungbaeldung";

String invalidString = "aca";
String invalidStringTwo = "ababa";
String invalidStringThree = "baeldungnonrepeatedbaeldung";

And, finally, we can easily check its validity:

而且,最后,我们可以很容易地检查其有效性。

assertTrue(containsOnlySubstrings(validString));
assertTrue(containsOnlySubstrings(validStringTwo));
assertTrue(containsOnlySubstrings(validStringThree));

assertFalse(containsOnlySubstrings(invalidString));
assertFalse(containsOnlySubstrings(invalidStringTwo));
assertFalse(containsOnlySubstrings(invalidStringThree));

Although this solution works, it’s not very efficient since we iterate through half of the String and use replaceAll() method in every iteration.

虽然这个方案可行,但它的效率并不高,因为我们迭代了一半的String,并在每次迭代中使用replaceAll()方法。

Obviously, it comes with the cost regarding the performance. It’ll run in time O(n^2).

很明显,它是以性能为代价的。它的运行时间为O(n^2)

4. The Efficient Solution

4.高效的解决方案

Now, we’ll illustrate another approach.

现在,我们将说明另一种方法。

Namely, we should make use of the fact that a String is made of the repeated substrings if and only if it’s a nontrivial rotation of itself.

也就是说,我们应该利用这样一个事实:一个字符串是由重复的子串组成的,当且仅当它是自身的非线性旋转

The rotation here means that we remove some characters from the beginning of the String and put them at the end. For example, “eldungba” is the rotation of “baeldung”. If we rotate a String and get the original one, then we can apply this rotation over and over again and get the String consisting of the repeated substrings.

这里的旋转是指我们从String的开头去掉一些字符,把它们放在最后。例如,”eldungba “是 “baeldung “的旋转。如果我们旋转一个String并得到原始的String,那么我们可以反复应用这种旋转,得到由重复的子串组成的String

Next, we need to check if this is the case with our example. To accomplish this, we’ll make use of the theorem which says that if String A and String B have the same length, then we can say that A is a rotation of B if and only if A is a substring of BB. If we go with the example from the previous paragraph, we can confirm this theorem: baeldungbaeldung.

接下来,我们需要检查我们的例子是否属于这种情况。为了达到这个目的,我们将利用这个定理,即如果字符串A和字符串B具有相同的长度,那么我们可以说,当且仅当A是BB的一个子串时,A是B的旋转。如果我们用上一段的例子,我们可以证实这个定理。baeldungbaeldung

Since we know that our String A will always be a substring of AA, we then only need to check if the String A is a substring of AA excluding the first character:

既然我们知道我们的String A总是AA的子串,那么我们只需要检查String A是否是AA的子串,不包括第一个字符。

public static boolean containsOnlySubstringsEfficient(String string) {
    return ((string + string).indexOf(string, 1) != string.length());
}

We can test this method the same way as the previous one. This time, we have O(n) time complexity.

我们可以用与前一个方法相同的方式来测试这个方法。这一次,我们有O(n)时间复杂性。

We can find some useful theorems about the topic in String analysis research.

我们可以在String分析研究中找到一些关于该主题的有用定理。

5. Conclusion

5.总结

In this article, we illustrated two ways of checking if a String consists only of its substrings in Java.

在这篇文章中,我们说明了在Java中检查一个String是否只由其子串组成的两种方法。

All code samples used in the article are available over on GitHub.

文章中使用的所有代码样本都可以在GitHub上找到over