1. Introduction
1.导言
In this tutorial, we’ll discuss a few ways to create a mutable String in Java.
在本教程中,我们将讨论在 Java 中创建可变 String 的几种方法。
2. Immutability of Strings
2.字符串的不变性
Unlike other programming languages like C or C++, Strings are immutable in Java.
与 C 或 C++ 等其他编程语言不同,Strings 在 Java 中是不可变的。
This immutable nature of Strings also means that any modifications to a String create a new String in memory with the modified content and return the updated reference. Java provides library classes such as StringBuffer and StringBuilder to work with mutable text data efficiently.
字符串的这种不可变特性还意味着,对字符串的任何修改都会在内存中创建一个包含修改内容的新字符串,并返回更新后的引用。Java 提供了 StringBuffer 和 StringBuilder 等库类来高效处理可变文本数据。
3. Mutable String Using Reflection
3.使用反射的可变字符串
We can attempt to create a mutable String in Java by using the Reflection framework. The Reflection framework in Java allows us to inspect and modify the structure of objects, methods, and their attributes at runtime. While it is a very powerful tool, it should be used with caution as it can leave bugs in the program without warnings.
我们可以尝试使用 Reflection 框架在 Java 中创建可变 String 。Java 中的 Reflection 框架允许我们在运行时检查和修改对象、方法及其属性的结构。虽然它是一个非常强大的工具,但应谨慎使用,因为它可能会在程序中留下错误而不发出警告。
We can employ some of the framework’s methods to update the value of Strings, thereby creating a mutable object. Let’s start by creating two Strings, one as a String literal and another with the new keyword:
我们可以使用框架的一些方法来更新 Strings 的值,从而创建一个可变对象。让我们从创建两个 Strings 开始,一个作为 String 字面,另一个使用 new 关键字:
String myString = "Hello World";
String otherString = new String("Hello World");
Now, we use Reflection’s getDeclaredField() method on the String class to obtain a Field instance and make it accessible for us to override the value:
现在,我们在 String 类上使用 Reflection 的 getDeclaredField() 方法来获取 Field 实例,并使其可被访问,以便我们重写值:
Field f = String.class.getDeclaredField("value");
f.setAccessible(true);
f.set(myString, "Hi World".toCharArray());
When we set the value of our first string to something else and try printing the second string, the mutated value appears:
当我们将第一个字符串的值设置为其他值并尝试打印第二个字符串时,变异值就会出现:
System.out.println(otherString);
Hi World
Therefore, we mutated a String, and any String objects referring to this literal get the updated value of “Hi World” in them. This can introduce bugs in the system and cause a lot of breakage. Java programs run with the underlying assumption that Strings are immutable. Any deviation from that may be catastrophic in nature.
因此,我们对 String, 进行了突变,任何引用该字面的 String 对象都会在其中获得 “Hi World” 的更新值。这可能会在系统中引入错误并导致大量故障。Java 程序运行的基本假设是 Strings 是不可变的。任何偏离这一假设的行为都可能造成灾难性后果。
It is also important to note that the above example is extremely dated and won’t work with newer Java releases.
还需要注意的是,上面的示例已经非常过时,无法在较新的 Java 版本中使用。
4. Charsets and Strings
4.字符集和字符串
4.1. Introduction to Charsets
4.1. 字符集简介</em
The solution discussed above has a lot of disadvantages and is inconvenient. A different way of mutating a string can be by implementing a custom CharSet for our program.
上面讨论的解决方案有很多缺点,而且很不方便。另一种更改字符串的方法是为我们的程序实现一个自定义 CharSet 。
Computers understand man-made characters only by their numeric codes. A Charset is a dictionary that maintains the mapping of characters against their binary counterpart. For example, ASCII has a character set of 128 characters. A standardized character encoding format, along with a defined Charset, ensures that text is properly interpreted in digital systems worldwide.
计算机只能通过数字编码来理解人造字符。字符集是一个字典,用于维护字符与其二进制对应码的映射关系。例如,ASCII字符集包含 128 个字符。标准化的字符编码格式以及定义的字符集可确保文本在全球数字系统中得到正确的解释。
Java provides extensive support for encodings and conversions. This includes US-ASCII, ISO-8859-1, UTF-8, and UTF-16, to name a few.
Java 提供广泛的编码和转换支持。其中包括 US-ASCII、ISO-8859-1、UTF-8 和 UTF-16 等等。
4.2. Using a Charset
4.2.使用 字符集</em
Let’s see an example of how we can use Charsets to encode and decode Strings. We’ll take a non-ASCII String and then encode it using UTF-8 charset. Conversely, we’ll then decode the string to the original input using the same charset.
让我们举例说明如何使用 字符集对 字符串进行编码和解码。反过来,我们将使用相同的字符集将字符串解码为原始输入。
Let’s start with the input String:
让我们从输入 String 开始:
String inputString = "Hello, दुनिया";
We obtain a charset for UTF-8 using the Charset.forName() method of java.nio.charset.Charset and also get an encoder:
我们使用 java.nio.charset.Charset 方法 Charset.forName() 获得 UTF-8 的字符集,并获得编码器:
Charset charset = Charset.forName("UTF-8");
CharsetEncoder encoder = charset.newEncoder();
The encoder object has an encode() method, which expects a CharBuffer object, a ByteBuffer object, and an endOfInput flag.
编码器对象有一个 encode() 方法,该方法需要一个 CharBuffer 对象、一个 ByteBuffer 对象和一个 endOfInput 标志。
The CharBuffer object is a buffer for holding Character data and can be obtained as follows:
CharBuffer对象是用于保存字符数据的缓冲区,可通过以下方式获得:
CharBuffer charBuffer = CharBuffer.wrap(inputString);
ByteBuffer byteBuffer = ByteBuffer.allocate(64);
We also create a ByteBuffer object of size 64 and then pass these to the encode() method to encode the input String:
我们还创建了一个大小为 64 的 ByteBuffer 对象,然后将其传递给 encode() 方法,对输入字符串进行编码:
encoder.encode(charBuffer, byteBuffer, true);
The byteBuffer object is now storing the encoded characters. We can decode the contents of the byteBuffer object to reveal the original String again:
byteBuffer 对象现在存储的是编码字符。我们可以对 byteBuffer 对象的内容进行解码,以再次显示原始字符串:
private static String decodeString(ByteBuffer byteBuffer) {
Charset charset = Charset.forName("UTF-8");
CharsetDecoder decoder = charset.newDecoder();
CharBuffer decodedCharBuffer = CharBuffer.allocate(50);
decoder.decode(byteBuffer, decodedCharBuffer, true);
decodedCharBuffer.flip();
return decodedCharBuffer.toString();
}
The following test verifies that we are able to decode the String back to its original value:
下面的测试验证了我们能否将字符串解码回其原始值:
String inputString = "hello दुनिया";
String result = ch.decodeString(ch.encodeString(inputString));
Assertions.assertEquals(inputString, result);
4.3. Creating a Custom Charset
4.3.创建自定义 字符集</em
We can also create our custom Charset class definition for our programs. To do this, we must provide concrete implementations of the following methods:
我们还可以为我们的程序创建自定义 Charset 类定义。为此,我们必须提供以下方法的具体实现:
- newDecoder() – this should return a CharsetDecoder instance
- newEncoder() – this should return a CharsetEncoder instance
We start with an inline Charset definition by creating a new instance of Charset as follows:
我们从内联 Charset 定义开始,创建一个新的 Charset 实例,如下所示:
private final Charset myCharset = new Charset("mycharset", null) {
// implement methods
}
We have already seen that Charsets extensively use CharBuffer objects in characters’ encoding and decoding lifecycle. In our custom charset definition, we create a shared CharBuffer object to use throughout the program:
我们已经看到,字符集 在字符的编码和解码生命周期中广泛使用 CharBuffer 对象。在自定义字符集定义中,我们创建了一个共享的 CharBuffer 对象,供整个程序使用:
private final AtomicReference<CharBuffer> cbRef = new AtomicReference<>();
Let’s now write our simple inline implementations of the newEncoder() and newDecoder() methods to complete our Charset definition. We’ll also inject the shared CharBuffer object cbRef in the methods:
现在,让我们编写 newEncoder() 和 newDecoder() 方法的简单内联实现,以完成我们的 Charset 定义。我们还将在这些方法中注入共享的 CharBuffer 对象 cbRef:
@Override
public CharsetDecoder newDecoder() {
return new CharsetDecoder(this, 1.0f, 1.0f) {
@Override
protected CoderResult decodeLoop(ByteBuffer in, CharBuffer out) {
cbRef.set(out);
while (in.remaining() > 0) {
out.append((char) in.get());
}
return CoderResult.UNDERFLOW;
}
};
}
@Override
public CharsetEncoder newEncoder() {
CharsetEncoder cd = new CharsetEncoder(this, 1.0f, 1.0f) {
@Override
protected CoderResult encodeLoop(CharBuffer in, ByteBuffer out) {
while (in.hasRemaining()) {
if (!out.hasRemaining()) {
return CoderResult.OVERFLOW;
}
char currentChar = in.get();
if (currentChar > 127) {
return CoderResult.unmappableForLength(1);
}
out.put((byte) currentChar);
}
return CoderResult.UNDERFLOW;
}
};
return cd;
}
4.4. Mutating a String With Custom Charset
4.4.使用自定义字符集更改字符串</em
We have now completed our Charset definition, and we can use this charset in our program. Let’s notice that we have a shared CharBuffer instance, which is updated with the output CharBuffer in the decoding process. This is an essential step towards mutating the string.
现在,我们已经完成了 Charset 的定义,可以在程序中使用该字符集。让我们注意一下,我们有一个共享的 CharBuffer 实例,在解码过程中,输出 CharBuffer 会更新该实例。这是改变字符串的重要一步。
String class in Java provides multiple constructors to create and initialize a String, and one of them takes in a bytes array and a Charset:
Java中的String类提供了多个构造函数来创建和初始化String,其中一个构造函数接收一个bytes数组和一个Charset:
public String(byte[] bytes, Charset charset) {
this(bytes, 0, bytes.length, charset);
}
We use this constructor to create a String, and we pass our custom charset object myCharset to it:
我们使用该构造函数创建一个 String, 并将自定义字符集对象 myCharset 传递给它:
public String createModifiableString(String s) {
return new String(s.getBytes(), charset);
}
Now that we have our String let’s try to mutate it by leveraging the CharBuffer we have:
现在我们有了字符串,让我们尝试利用字符缓冲区对其进行变异:
public void modifyString() {
CharBuffer cb = cbRef.get();
cb.position(0);
cb.put("something");
}
Here, we update the CharBuffer’s contents to a different value at the 0th position. As this character buffer is shared, and the charset maintains a reference to it in the decodeLoop() method of the decoder, the underlying char[] is also changed. We can verify this by adding a test:
在此,我们将 CharBuffer 的内容更新为 0th 位置上的不同值。由于该字符缓冲区是共享的,并且字符集会在解码器的 decodeLoop() 方法中保持对它的引用,因此底层的 char[] 也会发生变化。我们可以通过添加测试来验证这一点: .
String s = createModifiableString("Hello");
Assert.assertEquals("Hello", s);
modifyString();
Assert.assertEquals("something", s);
5. Final Thoughts on String Mutation
5.关于字符串突变的最终想法
We have seen a few ways to mutate a String. String mutation is controversial in the Java world mainly because almost all programs in Java assume the non-mutating nature of Strings.
我们已经看到了几种突变 String 的方法。在 Java 世界中,String 突变是有争议的,这主要是因为几乎所有 Java 程序都假定 String 不发生突变。
However, we need to work with changing Strings a lot of times, which is why Java provides us with the StringBuffer and StringBuilder classes. These classes work with mutable sequences of Characters and are hence easily modifiable. Using these classes is the best and most efficient way of working with mutable character sequences.
但是,我们经常需要处理不断变化的 Strings 字符串,因此 Java 为我们提供了 StringBuffer 和 StringBuilder 类。使用这些类是处理可变字符序列的最佳和最有效的方法。
6. Conclusion
6.结论
In this article, we looked into mutable Strings and ways of mutating a String. We also understood the disadvantages and difficulties in having a straightforward algorithm for mutating a String.
在本文中,我们研究了可变 String 和变异 String 的方法。我们还了解了采用直接算法来更改 String 的缺点和困难。
As usual, the code for this article is available over on GitHub.
与往常一样,本文的代码可在 GitHub 上获取。