1. Overview
1.概述
In this quick article, we’ll explore a fundamental class in Java – the StringTokenizer.
在这篇快速文章中,我们将探讨Java中的一个基本类–StringTokenizer。
2. StringTokenizer
2.StringTokenizer
The StringTokenizer class helps us split Strings into multiple tokens.
StringTokenizer类帮助我们将字符串分成多个标记。
StreamTokenizer provides similar functionality but the tokenization method is much simpler than the one used by the StreamTokenizer class. Methods of StringTokenizer do not distinguish among identifiers, numbers, and quoted strings, nor recognize and skip comments.
StreamTokenizer提供了类似的功能,但是标记化方法比StreamTokenizer类使用的方法简单得多。StringTokenizer的方法不区分标识符、数字和带引号的字符串,也不识别和跳过注释。
The set of delimiters (the characters that separate tokens) may be specified either at the creation time or on a per-token basis.
分界符(分隔标记的字符)的集合可以在创建时或在每个标记的基础上指定。
3. Using the StringTokenizer
3.使用StringTokenizer
The simplest example of using StringTokenizer will be to split a String based on specified delimiters.
使用StringTokenizer的最简单的例子将是根据指定的分隔符来分割一个String。
In this quick example, we’re going to split the argument String and add the tokens into a list:
在这个快速的例子中,我们将分割参数String,并将标记添加到一个列表中:
public List<String> getTokens(String str) {
List<String> tokens = new ArrayList<>();
StringTokenizer tokenizer = new StringTokenizer(str, ",");
while (tokenizer.hasMoreElements()) {
tokens.add(tokenizer.nextToken());
}
return tokens;
}
Notice how we’re breaking the String into the list of tokens based on delimiter ‘,‘. Then in the loop, using tokens.add() method; we are adding each token into the ArrayList.
注意我们是如何根据分隔符’,‘将String分解成标记列表的。然后在循环中,使用tokens.add()方法;我们将每个标记添加到ArrayList.。
For example, if a user gives input as “Welcome,to,baeldung.com“, this method should return a list containing a three-word fragment as “Welcome“, “to” and “baeldung.com“.
例如,如果用户输入”Welcome,to,baeldung.com“,这个方法应该返回一个包含三个词片段的列表,作为”Welcome“、”to“和”baeldung.com“。
3.1. Java 8 Approach
3.1.Java 8的方法
Since StringTokenizer implements Enumeration<Object> interface, we can use it with Java‘s Collections interface.
由于 StringTokenizer实现了Enumeration<Object>接口,我们可以用Java的Collections接口来使用它。
If we consider the earlier example, we can retrieve the same set of tokens using Collections.list() method and Stream API:
如果我们考虑前面的例子,我们可以使用Collections.list()方法和Stream API来检索相同的令牌集。
public List<String> getTokensWithCollection(String str) {
return Collections.list(new StringTokenizer(str, ",")).stream()
.map(token -> (String) token)
.collect(Collectors.toList());
}
Here, we are passing the StringTokenizer itself as a parameter in the Collections.list() method.
在这里,我们将StringTokenizer本身作为Collections.list()方法中的一个参数传递。
Point to note here is that, since the Enumeration is an Object type, we need to type-cast the tokens to String type (i.e. depends on the implementation; if we use List of Integer/Float then we’ll need to type-cast with Integer/Float).
这里需要注意的是,由于Enumeration是一个Object类型,我们需要将令牌类型转换为Stringtype(即取决于实现;如果我们使用List的Integer/Float,那么我们需要用Integer/Float进行类型转换)。
3.2. Variants of StringTokenizer
3.2.StringTokenizer的变体
StringTokenizer comes with two overloaded constructors beside the default constructor: StringTokenizer(String str) and StringTokenizer(String str, String delim, boolean returnDelims):
StringTokenizer除了默认构造函数外,还有两个重载构造函数。StringTokenizer(String str)和StringTokenizer(String str, String delim, boolean returnDelims):
StringTokenizer(String str, String delim, boolean returnDelims) takes an extra boolean input. If the boolean value is true, then StringTokenizer considers the delimiter itself as a token and add it to its internal pool of tokens.
StringTokenizer(String str, String delim, boolean returnDelims)需要一个额外的boolean输入。如果boolean值为true,那么StringTokenizer就会将定界符本身视为一个标记,并将其添加到其内部标记池中。
StringTokenizer(String str) is a shortcut for the previous example; it internally calls the other constructor with hard-coded delimiter as ” \t\n\r\f” and the boolean value as false.
StringTokenizer(String str)是前面例子的一个快捷方式;它在内部调用另一个构造函数,硬编码的分隔符为“\t\n\r\f”,布尔值为false.。
3.3. Token Customization
3.3.代币定制
StringTokenizer also comes with an overloaded nextToken() method which takes a string fragment as input. This String fragment acts as an extra set of delimiters; based on which tokens are re-organized again.
StringTokenizer还带有一个重载的nextToken()方法,它接受一个字符串片段作为输入。这个String片段作为一组额外的分隔符;基于此,标记被再次重新组织。
For example, if we can pass ‘e‘ in the nextToken() method to further break the string based on the delimiter ‘e‘:
例如,如果我们可以在nextToken()方法中传递’e‘,以根据分隔符’e‘进一步分解字符串。
tokens.add(tokenizer.nextToken("e"));
Hence, for a given string of ‘Hello,baeldung.com‘ we will produce following tokens:
因此,对于一个给定的”Hello,baeldung.com“字符串,我们将产生以下标记。
H
llo
ba
ldung.com
3.4. Token Length
3.4.代号长度
To count the available numbers of tokens, we can use StringTokenizer‘s countTokens method:
为了计算可用的令牌数量,我们可以使用StringTokenizer的countTokens方法。
int tokenLength = tokens.countTokens();
3.5. Reading From CSV File
3.5.从CSV文件中读取
Now, let’s try using StringTokenizer in a real use case.
现在,让我们尝试在一个真实的用例中使用StringTokenizer。
There are scenarios where we try to read data from CSV files and parse the data based on the user-given delimiter.
在有些情况下,我们试图从CSV文件中读取数据,并根据用户给定的分隔符来解析数据。
Using StringTokenizer, we can easily get there:
使用StringTokenizer,我们可以很容易地达到目的。
public List<String> getTokensFromFile( String path , String delim ) {
List<String> tokens = new ArrayList<>();
String currLine = "";
StringTokenizer tokenizer;
try (BufferedReader br = new BufferedReader(
new InputStreamReader(Application.class.getResourceAsStream(
"/" + path )))) {
while (( currLine = br.readLine()) != null ) {
tokenizer = new StringTokenizer( currLine , delim );
while (tokenizer.hasMoreElements()) {
tokens.add(tokenizer.nextToken());
}
}
} catch (IOException e) {
e.printStackTrace();
}
return tokens;
}
Here, the function takes two arguments; one as CSV file name (i.e. read from the resources [src -> main -> resources] folder) and the other one as a delimiter.
这里,该函数需要两个参数;一个是CSV文件名(即从资源[src -> main -> resources]文件夹中读取),另一个是分隔符。
Based on this two arguments, the CSV data is read line by line, and each line gets tokenized using StringTokenizer.
基于这两个参数,CSV数据被逐行读取,每一行都使用StringTokenizer进行标记。
For example, we’ve put following content in the CSV:
例如,我们在CSV中加入了以下内容。
1|IND|India
2|MY|Malaysia
3|AU|Australia
Hence, following tokens should be generated:
因此,应该生成以下令牌。
1
IND
India
2
MY
Malaysia
3
AU
Australia
3.6. Testing
3.6.测试
Now, let’s create a quick test case:
现在,让我们创建一个快速测试案例。
public class TokenizerTest {
private MyTokenizer myTokenizer = new MyTokenizer();
private List<String> expectedTokensForString = Arrays.asList(
"Welcome" , "to" , "baeldung.com" );
private List<String> expectedTokensForFile = Arrays.asList(
"1" , "IND" , "India" ,
"2" , "MY" , "Malaysia" ,
"3", "AU" , "Australia" );
@Test
public void givenString_thenGetListOfString() {
String str = "Welcome,to,baeldung.com";
List<String> actualTokens = myTokenizer.getTokens( str );
assertEquals( expectedTokensForString, actualTokens );
}
@Test
public void givenFile_thenGetListOfString() {
List<String> actualTokens = myTokenizer.getTokensFromFile(
"data.csv", "|" );
assertEquals( expectedTokensForFile , actualTokens );
}
}
4. Conclusion
4.结论
In this quick tutorial, we had a look at some practical examples of using the core Java StringTokenizer.
在这个快速教程中,我们看了一些使用Java核心StringTokenizer的实际例子。
Like always, the full source code is available over on GitHub.
像往常一样,完整的源代码可以在GitHub上找到,。