Guide to Java String Pool – Java字符串池指南

最后修改: 2017年 11月 21日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

The String object is the most used class in the Java language.

String对象是Java语言中使用最多的类。

In this quick article, we’ll explore the Java String Pool — the special memory region where Strings are stored by the JVM.

在这篇文章中,我们将探讨Java字符串池 – JVM存储字符串的特殊内存区域

2. String Interning

2.字符串内部化

Thanks to the immutability of Strings in Java, the JVM can optimize the amount of memory allocated for them by storing only one copy of each literal String in the pool. This process is called interning.

由于Java中字符串的不变性,JVM可以通过在池中只存储每个字面字符串的一个副本来优化分配给它们的内存量。这个过程被称为interning

When we create a String variable and assign a value to it, the JVM searches the pool for a String of equal value.

当我们创建一个String变量并为其赋值时,JVM会在池中搜索一个等值的String

If found, the Java compiler will simply return a reference to its memory address, without allocating additional memory.

如果找到了,Java编译器将简单地返回一个对其内存地址的引用,而不分配额外的内存。

If not found, it’ll be added to the pool (interned) and its reference will be returned.

如果没有找到,它将被添加到池中(interned),其引用将被返回。

Let’s write a small test to verify this:

让我们写一个小测试来验证这一点。

String constantString1 = "Baeldung";
String constantString2 = "Baeldung";
        
assertThat(constantString1)
  .isSameAs(constantString2);

3. Strings Allocated Using the Constructor

3.使用构造函数分配的字符串

When we create a String via the new operator, the Java compiler will create a new object and store it in the heap space reserved for the JVM.

当我们通过new操作符创建一个String时,Java编译器将创建一个新的对象并将其存储在为JVM保留的堆空间中。

Every String created like this will point to a different memory region with its own address.

像这样创建的每个String都将指向一个不同的内存区域,并有自己的地址。

Let’s see how this is different from the previous case:

让我们看看这与之前的案例有什么不同。

String constantString = "Baeldung";
String newString = new String("Baeldung");
 
assertThat(constantString).isNotSameAs(newString);

4. String Literal vs String Object

4.String Literal vs String Object

When we create a String object using the new() operator, it always creates a new object in heap memory. On the other hand, if we create an object using String literal syntax e.g. “Baeldung”, it may return an existing object from the String pool, if it already exists. Otherwise, it will create a new String object and put in the string pool for future re-use.

当我们使用new()操作符创建一个String对象时,它总是在堆内存中创建一个新对象。另一方面,如果我们使用String字面语法创建一个对象,例如 “Baeldung”,它可能会从String池中返回一个现有的对象,如果它已经存在的话。否则,它将创建一个新的String对象并放在字符串池中,以便将来再使用。

At a high level, both are the String objects, but the main difference comes from the point that new() operator always creates a new String object. Also, when we create a String using literal – it is interned.

在高层次上,两者都是String对象,但主要区别来自于new()操作符总是创建一个新的String对象。另外,当我们使用字面意思创建一个String时,它是内含的。

This will be much more clear when we compare two String objects created using String literal and the new operator:

当我们比较两个使用String字面和new操作符创建的String对象时,这一点将更加清楚。

String first = "Baeldung"; 
String second = "Baeldung"; 
System.out.println(first == second); // True

In this example, the String objects will have the same reference.

在这个例子中,String对象将有相同的引用。

Next, let’s create two different objects using new and check that they have different references:

接下来,让我们用new创建两个不同的对象,并检查它们是否有不同的引用。

String third = new String("Baeldung");
String fourth = new String("Baeldung"); 
System.out.println(third == fourth); // False

Similarly, when we compare a String literal with a String object created using new() operator using the == operator, it will return false:

同样地,当我们使用==操作符将一个String字头与一个使用new()操作符创建的String对象进行比较时,它将返回false:

String fifth = "Baeldung";
String sixth = new String("Baeldung");
System.out.println(fifth == sixth); // False

In general, we should use the String literal notation when possible. It is easier to read and it gives the compiler a chance to optimize our code.

一般来说,我们应该尽可能地使用String字面符号。它更容易阅读,并且给编译器一个优化我们代码的机会。

5. Manual Interning

5.手工实习

We can manually intern a String in the Java String Pool by calling the intern() method on the object we want to intern.

我们可以通过在我们想要实习的对象上调用intern()方法,在Java字符串池中手动实习一个字符串

Manually interning the String will store its reference in the pool, and the JVM will return this reference when needed.

手动截取String将在池中存储其引用,JVM将在需要时返回该引用。

Let’s create a test case for this:

让我们为此创建一个测试案例。

String constantString = "interned Baeldung";
String newString = new String("interned Baeldung");

assertThat(constantString).isNotSameAs(newString);

String internedString = newString.intern();

assertThat(constantString)
  .isSameAs(internedString);

6. Garbage Collection

6.垃圾收集

Before Java 7, the JVM placed the Java String Pool in the PermGen space, which has a fixed size — it can’t be expanded at runtime and is not eligible for garbage collection.

在Java 7之前,JVM 将Java字符串池放在PermGen空间中,该空间有一个固定的大小–它不能在运行时扩展,也没有资格进行垃圾回收

The risk of interning Strings in the PermGen (instead of the Heap) is that we can get an OutOfMemory error from the JVM if we intern too many Strings.

PermGen(而不是Heap)中插入Strings的风险是,如果我们实习了太多的Strings,我们会从JVM那里得到一个OutOfMemory错误

From Java 7 onwards, the Java String Pool is stored in the Heap space, which is garbage collected by the JVM. The advantage of this approach is the reduced risk of OutOfMemory error because unreferenced Strings will be removed from the pool, thereby releasing memory.

从Java 7开始,Java字符串池被存储在Heap空间,由JVM进行垃圾收集这种方法的优点是减少了OutOfMemory错误的风险,因为未引用的字符串将被从池中移除,从而释放了内存。

7. Performance and Optimizations

7.性能和优化

In Java 6, the only optimization we can perform is increasing the PermGen space during the program invocation with the MaxPermSize JVM option:

在Java 6中,我们唯一可以进行的优化是在程序调用期间用MaxPermSize JVM选项增加PermGen空间。

-XX:MaxPermSize=1G

In Java 7, we have more detailed options to examine and expand/reduce the pool size. Let’s see the two options for viewing the pool size:

在Java 7中,我们有更详细的选项来检查和扩大/减少池的大小。让我们来看看查看池子大小的两个选项。

-XX:+PrintFlagsFinal
-XX:+PrintStringTableStatistics

If we want to increase the pool size in terms of buckets, we can use the StringTableSize JVM option:

如果我们想以桶为单位增加池的大小,我们可以使用StringTableSize JVM选项。

-XX:StringTableSize=4901

Prior to Java 7u40, the default pool size was 1009 buckets but this value was subject to a few changes in more recent Java versions. To be precise, the default pool size from Java 7u40 until Java 11 was 60013 and now it increased to 65536.

在Java 7u40之前,默认的池子大小是1009个桶,但这个值在最近的Java版本中会有一些变化。准确地说,从Java 7u40到Java 11的默认池大小是60013,现在增加到65536。

Note that increasing the pool size will consume more memory but has the advantage of reducing the time required to insert the Strings into the table.

注意,增加池的大小将消耗更多的内存,但其优点是减少了向表中插入字符串的时间。

8. A Note About Java 9

8.关于Java 9的说明

Until Java 8, Strings were internally represented as an array of characters – char[], encoded in UTF-16, so that every character uses two bytes of memory.

在Java 8之前,字符串在内部被表示为一个字符数组–char[],以UTF-16编码,因此每个字符使用两个字节的内存。

With Java 9 a new representation is provided, called Compact Strings. This new format will choose the appropriate encoding between char[] and byte[] depending on the stored content.

在Java 9中提供了一种新的表示方法,称为Compact Strings。这种新格式将根据存储的内容在char[]byte[]之间选择适当的编码。

Since the new String representation will use the UTF-16 encoding only when necessary, the amount of heap memory will be significantly lower, which in turn causes less Garbage Collector overhead on the JVM.

由于新的String表示法将仅在必要时使用UTF-16编码,内存量将大大降低,这反过来又导致JVM的垃圾收集器开销减少。

9. Conclusion

9.结论

In this guide, we showed how the JVM and the Java compiler optimize memory allocations for String objects via the Java String Pool.

在本指南中,我们展示了JVM和Java编译器如何通过Java String Pool优化String对象的内存分配。

All code samples used in the article are available over on GitHub.

文章中使用的所有代码样本都可以在GitHub上找到over