Read a File Into a Map in Java – 在Java中把文件读入地图

最后修改: 2022年 3月 1日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

We know a Map holds key-value pairs in Java. Sometimes, we may want to load a text file’s content and convert it into a Java Map.

我们知道Map在Java中保存键-值对。有时,我们可能想加载一个文本文件的内容并将其转换为一个Java Map

In this quick tutorial, let’s explore how we can achieve it.

在这个快速教程中,让我们探讨如何实现它。

2. Introduction to the Problem

2.对问题的介绍

Since Map stores key-value entries, the file should follow a specific format if we would like to import a file’s content to a Java Map object.

由于Map存储的是键值条目,如果我们想将文件的内容导入到Java Map对象中,文件应该遵循特定的格式。

An example file may explain it quickly:

一个例子文件可以快速解释。

$ cat theLordOfRings.txt
title:The Lord of the Rings: The Return of the King
director:Peter Jackson
actor:Sean Astin
actor:Ian McKellen
Gandalf and Aragorn lead the World of Men against Sauron's
army to draw his gaze from Frodo and Sam as they approach Mount Doom with the One Ring.

As we can see in the theLordOfRings.txt file, if we consider the colon character as the delimiter, most lines follow the pattern “KEY:VALUE“, such as “director:Peter Jackson“.

正如我们在theLordOfRings.txt文件中看到的那样,如果我们将冒号字符视为分隔符,大多数行都遵循”KEY:VALUE“模式,例如”director:Peter Jackson“。

Therefore, we can read each line, parse the key and value, and put them in a Map object.

因此,我们可以读取每一行,解析键和值,并把它们放在一个Map对象中。

However, there are some special cases we need to take care of:

然而,有一些特殊情况我们需要加以注意。

  • Values containing the delimiter – Value shouldn’t be truncated. For example, the first line “title:The Lord of the Rings: The Return …
  • Duplicated Keys – Three strategies: overwriting the existing one, discarding the latter, and aggregating the values into a List depending on the requirement. For example, we have two “actor” keys in the file.
  • Lines that don’t follow the “KEY:VALUE” pattern – The line should be skipped. For instance, see the last two lines in the file.

Next, let’s read this file and store it in a Java Map object.

接下来,让我们读取这个文件并将其存储在一个JavaMap对象中。

3. The DupKeyOption Enum

3.DupKeyOption枚举

As we’ve discussed, we’ll have three options for the duplicated keys case: overwriting, discarding, and aggregating.

正如我们所讨论的,对于重复键的情况,我们将有三种选择:覆盖、丢弃和聚合。

Moreover, if we use the overwriting or discarding option, we’ll have a returned Map of type Map<String, String>. However, if we would like to aggregate values for duplicate keys, we’ll get the result as Map<String, List<String>>.

此外,如果我们使用覆盖或丢弃选项,我们将得到一个Map类型的返回Map<String, String>。然而,如果我们想聚合重复键的值,我们将得到的结果是Map<String, List<String>>

So, let’s first explore the overwriting and discarding scenarios. In the end, we’ll discuss the aggregating option in a standalone section.

因此,让我们首先探讨覆盖和丢弃的方案。最后,我们将在一个独立的章节中讨论聚合选项。

To make our solution flexible, let’s create an enum class so that we can pass the option as a parameter to our solution methods:

为了使我们的解决方案更加灵活,让我们创建一个enum类,这样我们就可以将选项作为参数传递给我们的解决方案方法。

enum DupKeyOption {
    OVERWRITE, DISCARD
}

4. Using the BufferedReader and FileReader Classes

4.使用BufferedReaderFileReader

We can combine BufferedReader and FileReader to read content from a file line by line.

我们可以结合BufferedReaderFileReader来逐行读取文件的内容

4.1. Creating the byBufferedReader Method

4.1.创建byBufferedReader方法

Let’s create a method based on BufferedReader and FileReader:

让我们创建一个基于BufferedReaderFileReader的方法。

public static Map<String, String> byBufferedReader(String filePath, DupKeyOption dupKeyOption) {
    HashMap<String, String> map = new HashMap<>();
    String line;
    try (BufferedReader reader = new BufferedReader(new FileReader(filePath))) {
        while ((line = reader.readLine()) != null) {
            String[] keyValuePair = line.split(":", 2);
            if (keyValuePair.length > 1) {
                String key = keyValuePair[0];
                String value = keyValuePair[1];
                if (DupKeyOption.OVERWRITE == dupKeyOption) {
                    map.put(key, value);
                } else if (DupKeyOption.DISCARD == dupKeyOption) {
                    map.putIfAbsent(key, value);
                }
            } else {
                System.out.println("No Key:Value found in line, ignoring: " + line);
            }
        }
    } catch (IOException e) {
        e.printStackTrace();
    }
    return map;
}

The byBufferedReader method accepts two parameters: the input file path, and the dupKeyOption object that decides how to handle entries with duplicated keys.

byBufferedReader方法接受两个参数:输入文件路径和dupKeyOption对象,决定如何处理有重复键的条目。

As the code above shows, we’ve defined a BufferedReader object to read lines from the given input file. Then, we parse and handle each line in a while loop. Let’s walk through and understand how it works:

如上面的代码所示,我们定义了一个BufferedReader对象,从给定的输入文件中读取行。然后,我们在一个while循环中解析和处理每一行。让我们走一遍,了解它是如何工作的。

  • We create a BufferedReader object and use try-with-resources to ensure the reader object gets closed automatically
  • We use the split method with the limit parameter to keep the value part as it is if it contains colon characters
  • Then an if check filters out the line that doesn’t match the “KEY:VALUE” pattern
  • In case there are duplicate keys, if we would like to take the “overwrite” strategy, we can simply call map.put(key, value)
  • Otherwise, calling the putIfAbsent method allows us to ignore the latter coming entries with duplicated keys

Next, let’s test if the method works as expected.

接下来,让我们测试一下该方法是否如预期那样工作。

4.2. Testing the Solution

4.2.测试解决方案

Before we write the corresponding test method, let’s initialize two map objects containing the expected entries:

在我们编写相应的测试方法之前,让我们先初始化两个包含预期条目的地图对象

private static final Map<String, String> EXPECTED_MAP_DISCARD = Stream.of(new String[][]{
    {"title", "The Lord of the Rings: The Return of the King"},
    {"director", "Peter Jackson"},
    {"actor", "Sean Astin"}
  }).collect(Collectors.toMap(data -> data[0], data -> data[1]));

private static final Map<String, String> EXPECTED_MAP_OVERWRITE = Stream.of(new String[][]{
...
    {"actor", "Ian McKellen"}
  }).collect(Collectors.toMap(data -> data[0], data -> data[1]));

As we can see, we’ve initialized two Map objects to help with test assertions. One is for the case where we discard duplicate keys, and the other is for when we overwrite them.

正如我们所看到的,我们已经初始化了两个Map 对象来帮助测试断言。一个是针对我们抛弃重复键的情况,另一个是针对我们覆盖它们的情况。

Next, let’s test our method to see if we can get the expected Map objects:

接下来,让我们测试一下我们的方法,看看我们是否可以得到预期的Map对象。

@Test
public void givenInputFile_whenInvokeByBufferedReader_shouldGetExpectedMap() {
    Map<String, String> mapOverwrite = FileToHashMap.byBufferedReader(filePath, FileToHashMap.DupKeyOption.OVERWRITE);
    assertThat(mapOverwrite).isEqualTo(EXPECTED_MAP_OVERWRITE);

    Map<String, String> mapDiscard = FileToHashMap.byBufferedReader(filePath, FileToHashMap.DupKeyOption.DISCARD);
    assertThat(mapDiscard).isEqualTo(EXPECTED_MAP_DISCARD);
}

If we give it a run, the test passes. So, we’ve solved the problem.

如果我们让它运行,测试就会通过。所以,我们已经解决了这个问题。

5. Using Java Stream

5.使用JavaStream

Stream has been around since Java 8. Also, the Files.lines method can conveniently return a Stream object containing all lines in a file.

Stream从Java 8开始就已经存在了。另外,Files.lines方法可以方便地返回一个Stream对象,其中包含文件中的所有行

Now, let’s create a mothed using Stream to solve the problem:

现在,让我们使用Stream创建一个mothed来解决这个问题。

public static Map<String, String> byStream(String filePath, DupKeyOption dupKeyOption) {
    Map<String, String> map = new HashMap<>();
    try (Stream<String> lines = Files.lines(Paths.get(filePath))) {
        lines.filter(line -> line.contains(":"))
            .forEach(line -> {
                String[] keyValuePair = line.split(":", 2);
                String key = keyValuePair[0];
                String value = keyValuePair[1];
                if (DupKeyOption.OVERWRITE == dupKeyOption) {
                    map.put(key, value);
                } else if (DupKeyOption.DISCARD == dupKeyOption) {
                    map.putIfAbsent(key, value);
                }
            });
    } catch (IOException e) {
        e.printStackTrace();
    }
    return map;
}

As the code above shows, the main logic is quite similar to our byBufferedReader method. Let’s pass through quickly:

正如上面的代码所示,主要逻辑与我们的byBufferedReader方法很相似。让我们快速通过。

  • We’re still using try-with-resources on the Stream object since the Stream object contains a reference to the open file. We should close the file by closing the stream.
  • The filter method skips all lines that don’t follow the “KEY:VALUE” pattern.
  • The forEach method does pretty much the same as the while block in the byBufferedReader solution.

Finally, let’s test the byStream solution:

最后,让我们测试一下byStream解决方案。

@Test
public void givenInputFile_whenInvokeByStream_shouldGetExpectedMap() {
    Map<String, String> mapOverwrite = FileToHashMap.byStream(filePath, FileToHashMap.DupKeyOption.OVERWRITE);
    assertThat(mapOverwrite).isEqualTo(EXPECTED_MAP_OVERWRITE);

    Map<String, String> mapDiscard = FileToHashMap.byStream(filePath, FileToHashMap.DupKeyOption.DISCARD);
    assertThat(mapDiscard).isEqualTo(EXPECTED_MAP_DISCARD);
}

When we execute the test, it passes as well.

当我们执行这个测试时,它也通过了。

6. Aggregating Values by Keys

6.按键值汇总

So far, we’ve seen the solutions to the overwriting and discarding scenarios. But, as we’ve discussed, if it’s required, we can also aggregate values by keys. Thus, in the end, we’ll have a Map object of the type Map<String, List<String>>. Now, let’s build a method to realize this requirement:

到目前为止,我们已经看到了覆盖和丢弃情况下的解决方案。但是,正如我们所讨论的,如果有需要的话,我们也可以通过键来聚合值。因此,最后,我们会有一个Map对象,其类型为Map<String, List<String>>。现在,让我们建立一个方法来实现这一要求。

public static Map<String, List<String>> aggregateByKeys(String filePath) {
    Map<String, List<String>> map = new HashMap<>();
    try (Stream<String> lines = Files.lines(Paths.get(filePath))) {
        lines.filter(line -> line.contains(":"))
          .forEach(line -> {
              String[] keyValuePair = line.split(":", 2);
              String key = keyValuePair[0];
              String value = keyValuePair[1];
              if (map.containsKey(key)) {
                  map.get(key).add(value);
              } else {
                  map.put(key, Stream.of(value).collect(Collectors.toList()));
              }
          });
    } catch (IOException e) {
        e.printStackTrace();
    }
    return map;
}

We’ve used the Stream approach to read all lines in the input file. The implementation is pretty straightforward. Once we’ve parsed the key and value from an input line, we check if the key already exists in the result map object. If it does exist, we append the value to the existing list. Otherwise, we initialize a List containing the current value as the single element: Stream.of(value).collect(Collectors.toList()). 

我们使用了Stream方法来读取输入文件中的所有行。这个实现是非常直接的。一旦我们解析了输入行的键和值,我们就检查该键是否已经存在于结果map对象中。如果它确实存在,我们就把这个值追加到现有的列表中。否则,我们初始化一个List,包含当前值作为单一元素。Stream.of(value).collect(Collectors.toList())。

It’s worth mentioning that we shouldn’t initialize the List using Collections.singletonList(value) or List.of(value). This is because both Collections.singletonList and List.of (Java 9+) methods return an immutable List. That is to say, if the same key comes again, we cannot append the value to the list.

值得一提的是,我们不应该使用List来初始化Collections.singletonList(value)List.of(value)。这是因为Collections.singletonListList.of(Java 9+)方法都返回一个不可变的List。也就是说,如果相同的键再次出现,我们不能将该值追加到列表中。

Next, let’s test our method to see if it does the job. As usual, we create the expected result first:

接下来,让我们测试一下我们的方法,看看它是否完成了工作。像往常一样,我们先创建一个预期的结果。

private static final Map<String, List<String>> EXPECTED_MAP_AGGREGATE = Stream.of(new String[][]{
      {"title", "The Lord of the Rings: The Return of the King"},
      {"director", "Peter Jackson"},
      {"actor", "Sean Astin", "Ian McKellen"}
  }).collect(Collectors.toMap(arr -> arr[0], arr -> Arrays.asList(Arrays.copyOfRange(arr, 1, arr.length))));

Then, the test method itself is pretty simple:

然后,测试方法本身也很简单。

@Test
public void givenInputFile_whenInvokeAggregateByKeys_shouldGetExpectedMap() {
    Map<String, List<String>> mapAgg = FileToHashMap.aggregateByKeys(filePath);
    assertThat(mapAgg).isEqualTo(EXPECTED_MAP_AGGREGATE);
}

The test passes if we give it a run. It means our solution works as expected.

如果我们让它运行一下,测试就会通过。这意味着我们的解决方案按预期工作。

7. Conclusion

7.结语

In this article, we’ve learned two approaches to read content from a text file and save it in a Java Map object: using BufferedReader class and using Stream.

在这篇文章中,我们学习了从文本文件中读取内容并保存在Java Map对象中的两种方法:使用BufferedReader类和使用Stream

Further, we’ve addressed implementing three strategies to handle duplicate keys: overwriting, discarding, and aggregating.

此外,我们已经解决了实现处理重复键的三种策略:覆盖、丢弃和聚合。

As always, the full version of the code is available over on GitHub.

一如既往,完整版本的代码可在GitHub上获得