1. Introduction
In this tutorial, we’re going to focus on the performance aspect of the Java String API.
We’ll dig into String creation, conversion and modification operations to analyze the available options and compare their efficiency.
The suggestions we’re going to make won’t be necessarily the right fit for every application. But certainly, we’re going to show how to win on performance when the application running time is critical.
2. Constructing a New String
As you know, in Java, Strings are immutable. So every time we construct or concatenate a String object, Java creates a new String – this might be especially costly if done in a loop.
正如你所知,在Java中,字符串是不可变的。因此,每当我们构造或连接一个String对象时,Java就会创建一个新的String – 如果在一个循环中进行,这可能会特别昂贵。
2.1. Using Constructor
In most cases, we should avoid creating Strings using the constructor unless we know what are we doing.
Let’s create a newString object inside of the loop first, using the new String() constructor, then the = operator.
让我们先在循环中创建一个newString对象,使用new String()构造函数,然后使用=操作符。
To write our benchmark, we’ll use the JMH (Java Microbenchmark Harness) tool.
为了编写我们的基准,我们将使用JMH(Java Microbenchmark Harness)工具。
Our configuration:
@Measurement(batchSize = 10000, iterations = 10)
@Warmup(batchSize = 10000, iterations = 10)
public class StringPerformance {
Here, we’re using the SingeShotTime mode, which runs the method only once. As we want to measure the performance of String operations inside of the loop, there’s a @Measurement annotation available for that.
Important to know, that benchmarking loops directly in our tests may skew the results because of various optimizations applied by JVM.
So we calculate only the single operation and let JMH take care for the looping. Briefly speaking, JMH performs the iterations by using the batchSize parameter.
Now, let’s add the first micro-benchmark:
public String benchmarkStringConstructor() {
return new String("baeldung");
public String benchmarkStringLiteral() {
return "baeldung";
In the first test, a new object is created in every iteration. In the second test, the object is created only once. For remaining iterations, the same object is returned from the String’s constant pool.
Let’s run the tests with the looping iterations count = 1,000,000 and see the results:
Benchmark Mode Cnt Score Error Units
benchmarkStringConstructor ss 10 16.089 ± 3.355 ms/op
benchmarkStringLiteral ss 10 9.523 ± 3.331 ms/op
From the Score values, we can clearly see that the difference is significant.
2.2. + Operator
2.2.+ 操作员
Let’s have a look at dynamic String concatenation example:
public static class StringPerformanceHints {
String result = "";
String baeldung = "baeldung";
public String benchmarkStringDynamicConcat() {
return result + baeldung;
In our results, we want to see the average execution time. The output number format is set to milliseconds:
Benchmark 1000 10,000
benchmarkStringDynamicConcat 47.331 4370.411
Now, let’s analyze the results. As we see, adding 1000 items to state.result takes 47.331 milliseconds. Consequently, increasing the number of iterations in 10 times, the running time grows to 4370.441 milliseconds.
In summary, the time of execution grows quadratically. Therefore, the complexity of dynamic concatenation in a loop of n iterations is O(n^2).
2.3. String.concat()
One more way to concatenate Strings is by using the concat() method:
public String benchmarkStringConcat() {
return result.concat(baeldung);
Output time unit is a millisecond, iterations count is 100,000. The result table looks like:
Benchmark Mode Cnt Score Error Units
benchmarkStringConcat ss 10 3403.146 ± 852.520 ms/op
2.4. String.format()
Another way to create strings is by using String.format() method. Under the hood, it uses regular expressions to parse the input.
Let’s write the JMH test case:
String formatString = "hello %s, nice to meet you";
public String benchmarkStringFormat_s() {
return String.format(formatString, baeldung);
After, we run it and see the results:
Number of Iterations 10,000 100,000 1,000,000
benchmarkStringFormat_s 17.181 140.456 1636.279 ms/op
Although the code with String.format() looks more clean and readable, we don’t win here in term of performance.
2.5. StringBuilder and StringBuffer
We already have a write-up explaining StringBuffer and StringBuilder. So here, we’ll show only extra information about their performance. StringBuilder uses a resizable array and an index that indicates the position of the last cell used in the array. When the array is full, it expands double of its size and copies all the characters into the new array.
Taking into account that resizing doesn’t occur very often, we can consider each append() operation as O(1) constant time. Taking this into account, the whole process has O(n) complexity.
After modifying and running the dynamic concatenation test for StringBuffer and StringBuilder, we get:
Benchmark Mode Cnt Score Error Units
benchmarkStringBuffer ss 10 1.409 ± 1.665 ms/op
benchmarkStringBuilder ss 10 1.200 ± 0.648 ms/op
Although the score difference isn’t much, we can notice that StringBuilder works faster.
Fortunately, in simple cases, we don’t need StringBuilder to put one String with another. Sometimes, static concatenation with + can actually replace StringBuilder. Under the hood, the latest Java compilers will call the StringBuilder.append() to concatenate strings.
This means winning in performance significantly.
3. Utility Operations
3.1. StringUtils.replace() vs String.replace()
Interesting to know, that Apache Commons version for replacing the String does way better than the String’s own replace() method. The answer to this difference lays under their implementation. String.replace() uses a regex pattern to match the String.
有趣的是,Apache Commons版本用于替换String,比String自己的replace()方法要好得多。这种差异的答案就在他们的实现之下。String.replace()使用一个regex模式来匹配String.。
In contrast, StringUtils.replace() is widely using indexOf(), which is faster.
Now, it’s time for the benchmark tests:
public String benchmarkStringReplace() {
return longString.replace("average", " average !!!");
public String benchmarkStringUtilsReplace() {
return StringUtils.replace(longString, "average", " average !!!");
Setting the batchSize to 100,000, we present the results:
Benchmark Mode Cnt Score Error Units
benchmarkStringReplace ss 10 6.233 ± 2.922 ms/op
benchmarkStringUtilsReplace ss 10 5.355 ± 2.497 ms/op
Although the difference between the numbers isn’t too big, the StringUtils.replace() has a better score. Of course, the numbers and the gap between them may vary depending on parameters like iterations count, string length and even JDK version.
With the latest JDK 9+ (our tests are running on JDK 10) versions both implementations have fairly equal results. Now, let’s downgrade the JDK version to 8 and the tests again:
在最新的JDK 9+(我们的测试是在JDK 10上运行的)版本下,两种实现的结果相当。现在,让我们把JDK版本降级到8,再进行测试。
Benchmark Mode Cnt Score Error Units
benchmarkStringReplace ss 10 48.061 ± 17.157 ms/op
benchmarkStringUtilsReplace ss 10 14.478 ± 5.752 ms/op
The performance difference is huge now and confirms the theory which we discussed in the beginning.
3.2. split()
Before we start, it’ll be useful to check out string splitting methods available in Java.
When there is a need to split a string with the delimiter, the first function that comes to our mind usually is String.split(regex). However, it brings some serious performance issues, as it accepts a regex argument. Alternatively, we can use the StringTokenizer class to break the string into tokens.
Another option is Guava’s Splitter API. Finally, the good old indexOf() is also available to boost our application’s performance if we don’t need the functionality of regular expressions.
另一个选择是Guava的Splitter API。最后,如果我们不需要正则表达式的功能,也可以使用古老的indexOf()来提高我们应用程序的性能。
Now, it’s time to write the benchmark tests for String.split() option:
String emptyString = " ";
public String [] benchmarkStringSplit() {
return longString.split(emptyString);
Pattern.split() :
Pattern.split() :
public String [] benchmarkStringSplitPattern() {
return spacePattern.split(longString, 0);
StringTokenizer :
StringTokenizer 。
List stringTokenizer = new ArrayList<>();
public List benchmarkStringTokenizer() {
StringTokenizer st = new StringTokenizer(longString);
while (st.hasMoreTokens()) {
return stringTokenizer;
String.indexOf() :
String.indexOf() :
List stringSplit = new ArrayList<>();
public List benchmarkStringIndexOf() {
int pos = 0, end;
while ((end = longString.indexOf(' ', pos)) >= 0) {
stringSplit.add(longString.substring(pos, end));
pos = end + 1;
return stringSplit;
Guava’s Splitter :
Guava的Splitter :
public List<String> benchmarkGuavaSplitter() {
return Splitter.on(" ").trimResults()
Finally, we run and compare results for batchSize = 100,000:
最后,我们运行并比较batchSize = 100,000的结果。
Benchmark Mode Cnt Score Error Units
benchmarkGuavaSplitter ss 10 4.008 ± 1.836 ms/op
benchmarkStringIndexOf ss 10 1.144 ± 0.322 ms/op
benchmarkStringSplit ss 10 1.983 ± 1.075 ms/op
benchmarkStringSplitPattern ss 10 14.891 ± 5.678 ms/op
benchmarkStringTokenizer ss 10 2.277 ± 0.448 ms/op
As we see, the worst performance has the benchmarkStringSplitPattern method, where we use the Pattern class. As a result, we can learn that using a regex class with the split() method may cause performance loss in multiple times.
Likewise, we notice that the fastest results are providing examples with the use of indexOf() and split().
3.3. Converting to String
In this section, we’re going to measure the runtime scores of string conversion. To be more specific, we’ll examine Integer.toString() concatenation method:
int sampleNumber = 100;
public String benchmarkIntegerToString() {
return Integer.toString(sampleNumber);
String.valueOf() :
String.valueOf() :
public String benchmarkStringValueOf() {
return String.valueOf(sampleNumber);
[some integer value] + “” :
[一些整数值] + “” 。
public String benchmarkStringConvertPlus() {
return sampleNumber + "";
String.format() :
String.format() :
String formatDigit = "%d";
public String benchmarkStringFormat_d() {
return String.format(formatDigit, sampleNumber);
After running the tests, we’ll see the output for batchSize = 10,000:
运行测试后,我们会看到batchSize = 10,000的输出。
Benchmark Mode Cnt Score Error Units
benchmarkIntegerToString ss 10 0.953 ± 0.707 ms/op
benchmarkStringConvertPlus ss 10 1.464 ± 1.670 ms/op
benchmarkStringFormat_d ss 10 15.656 ± 8.896 ms/op
benchmarkStringValueOf ss 10 2.847 ± 11.153 ms/op
After analyzing the results, we see that the test for Integer.toString() has the best score of 0.953 milliseconds. In contrast, a conversion which involves String.format(“%d”) has the worst performance.
That’s logical because parsing the format String is an expensive operation.
3.4. Comparing Strings
Let’s evaluate different ways of comparing Strings. The iterations count is 100,000.
让我们评估一下比较字符串的不同方法。 迭代次数为100,000。
Here are our benchmark tests for the String.equals() operation:
public boolean benchmarkStringEquals() {
return longString.equals(baeldung);
String.equalsIgnoreCase() :
String.equalsIgnoreCase() :
public boolean benchmarkStringEqualsIgnoreCase() {
return longString.equalsIgnoreCase(baeldung);
String.matches() :
String.matches() :
public boolean benchmarkStringMatches() {
return longString.matches(baeldung);
String.compareTo() :
String.compareTo() :
public int benchmarkStringCompareTo() {
return longString.compareTo(baeldung);
After, we run the tests and display the results:
Benchmark Mode Cnt Score Error Units
benchmarkStringCompareTo ss 10 2.561 ± 0.899 ms/op
benchmarkStringEquals ss 10 1.712 ± 0.839 ms/op
benchmarkStringEqualsIgnoreCase ss 10 2.081 ± 1.221 ms/op
benchmarkStringMatches ss 10 118.364 ± 43.203 ms/op
As always, the numbers speak for themselves. The matches() takes the longest time as it uses the regex to compare the equality.
In contrast, the equals() and equalsIgnoreCase() are the best choices.
3.5. String.matches() vs Precompiled Pattern
3.5.String.matches() vs Precompiled Pattern
Now, let’s have a separate look at String.matches() and Matcher.matches() patterns. The first one takes a regexp as an argument and compiles it before executing.
So every time we call String.matches(), it compiles the Pattern:
public boolean benchmarkStringMatches() {
return longString.matches(baeldung);
The second method reuses the Pattern object:
Pattern longPattern = Pattern.compile(longString);
public boolean benchmarkPrecompiledMatches() {
return longPattern.matcher(baeldung).matches();
And now the results:
Benchmark Mode Cnt Score Error Units
benchmarkPrecompiledMatches ss 10 29.594 ± 12.784 ms/op
benchmarkStringMatches ss 10 106.821 ± 46.963 ms/op
As we see, matching with precompiled regexp works about three times faster.
3.6. Checking the Length
Finally, let’s compare the String.isEmpty() method:
public boolean benchmarkStringIsEmpty() {
return longString.isEmpty();
and the String.length() method:
public boolean benchmarkStringLengthZero() {
return emptyString.length() == 0;
First, we call them over the longString = “Hello baeldung, I am a bit longer than other Strings in average” String. The batchSize is 10,000:
首先,我们在longString = “Hello baeldung, I am a bit longer than other Strings in average” String. 上调用它们,batchSize是10,000。
Benchmark Mode Cnt Score Error Units
benchmarkStringIsEmpty ss 10 0.295 ± 0.277 ms/op
benchmarkStringLengthZero ss 10 0.472 ± 0.840 ms/op
After, let’s set the longString = “” empty string and run the tests again:
之后,让我们设置longString = “”空字符串并再次运行测试。
Benchmark Mode Cnt Score Error Units
benchmarkStringIsEmpty ss 10 0.245 ± 0.362 ms/op
benchmarkStringLengthZero ss 10 0.351 ± 0.473 ms/op
As we notice, benchmarkStringLengthZero() and benchmarkStringIsEmpty() methods in both cases have approximately the same score. However, calling isEmpty() works faster than checking if the string’s length is zero.
4. String Deduplication
Since JDK 8, string deduplication feature is available to eliminate memory consumption. Simply put, this tool is looking for the strings with the same or duplicate contents to store one copy of each distinct string value into the String pool.
从JDK 8开始,可以使用重复数据删除功能来消除内存消耗。简单地说,该工具正在寻找内容相同或重复的字符串,将每个不同的字符串值的一个副本存储到字符串池中。
Currently, there are two ways to handle String duplicates:
- using the String.intern() manually
- enabling string deduplication
Let’s have a closer look at each option.
4.1. String.intern()
Before jumping ahead, it will be useful to read about manual interning in our write-up. With String.intern() we can manually set the reference of the String object inside of the global String pool.
Then, JVM can use return the reference when needed. From the point of view of performance, our application can hugely benefit by reusing the string references from the constant pool.
Important to know, that JVM String pool isn’t local for the thread. Each String that we add to the pool, is available to other threads as well.
However, there are serious disadvantages as well:
- to maintain our application properly, we may need to set a -XX:StringTableSize JVM parameter to increase the pool size. JVM needs a restart to expand the pool size
- calling String.intern() manually is time-consuming. It grows in a linear time algorithm with O(n) complexity
- additionally, frequent calls on long String objects may cause memory problems
To have some proven numbers, let’s run a benchmark test:
public String benchmarkStringIntern() {
return baeldung.intern();
Additionally, the output scores are in milliseconds:
Benchmark 1000 10,000 100,000 1,000,000
benchmarkStringIntern 0.433 2.243 19.996 204.373
The column headers here represent a different iterations counts from 1000 to 1,000,000. For each iteration number, we have the test performance score. As we notice, the score increases dramatically in addition to the number of iterations.
4.2. Enable Deduplication Automatically
First of all, this option is a part of the G1 garbage collector. By default, this feature is disabled. So we need to enable it with the following command:
-XX:+UseG1GC -XX:+UseStringDeduplication
Important to note, that enabling this option doesn’t guarantee that String deduplication will happen. Also, it doesn’t process young Strings. In order to manage the minimal age of processing Strings, XX:StringDeduplicationAgeThreshold=3 JVM option is available. Here, 3 is the default parameter.
需要注意的是,启用这个选项并不能保证字符串重复数据删除会发生。而且,它不会处理年轻的字符串。为了管理处理字符串的最小年龄,XX:StringDeduplicationAgeThreshold=3 JVM选项可用。这里,3是默认参数。
5. Summary
In this tutorial, we’re trying to give some hints to use strings more efficiently in our daily coding life.
As a result, we can highlight some suggestions in order to boost our application performance:
- when concatenating strings, the StringBuilder is the most convenient option that comes to mind. However, with the small strings, the + operation has almost the same performance. Under the hood, the Java compiler may use the StringBuilder class to reduce the number of string objects
- to convert the value into the string, the [some type].toString() (Integer.toString() for example) works faster then String.valueOf(). Because that difference isn’t significant, we can freely use String.valueOf() to not have a dependency on the input value type
- when it comes to string comparison, nothing beats the String.equals() so far
- String deduplication improves performance in large, multi-threaded applications. But overusing String.intern() may cause serious memory leaks, slowing down the application
- for splitting the strings we should use indexOf() to win in performance. However, in some noncritical cases String.split() function might be a good fit
- Using Pattern.match() the string improves performance significantly
- String.isEmpty() is faster than String.length() ==0
Also, keep in mind that the numbers we present here are just JMH benchmark results – so you should always test in the scope of your own system and runtime to determine the impact of these kinds of optimizations.
此外,请记住,我们在这里提出的数字只是JMH的基准结果 – 所以你应该始终在自己的系统和运行时间范围内进行测试,以确定这些类型的优化的影响。
Finally, as always, the code used during the discussion can be found over on GitHub.