The Trie Data Structure in Java – Java中的Trie数据结构

最后修改: 2018年 1月 23日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

Data structures represent a crucial asset in computer programming, and knowing when and why to use them is very important.

数据结构代表了计算机编程中的一项重要资产,知道何时和为何使用它们是非常重要的。

This article is a brief introduction to trie (pronounced “try”) data structure, its implementation and complexity analysis.

本文简要介绍了trie(读作 “try”)数据结构、其实现和复杂性分析。

2. Trie

2.Trie

A trie is a discrete data structure that’s not quite well-known or widely-mentioned in typical algorithm courses, but nevertheless an important one.

trie是一种离散的数据结构,在典型的算法课程中不太知名,也没有被广泛提及,但却是一种重要的数据结构。

A trie (also known as a digital tree) and sometimes even radix tree or prefix tree (as they can be searched by prefixes), is an ordered tree structure, which takes advantage of the keys that it stores – usually strings.

trie(也被称为数字树),有时甚至是radix树或前缀树(因为它们可以通过前缀进行搜索),是一种有序的树形结构,它利用了它所存储的键–通常是字符串。

A node’s position in the tree defines the key with which that node is associated, which makes tries different in comparison to binary search trees, in which a node stores a key that corresponds only to that node.

一个节点在树中的位置定义了与该节点相关的键,这使得tries与二进制搜索树不同,在二进制搜索树中,一个节点存储一个仅与该节点对应的键。

All descendants of a node have a common prefix of a String associated with that node, whereas the root is associated with an empty String.

一个节点的所有后代都有一个与该节点相关的String的共同前缀,而根则与一个空的String相关。

Here we have a preview of TrieNode that we will be using in our implementation of the Trie:

这里我们有一个TrieNode的预览,我们将在实现Trie时使用:

public class TrieNode {
    private HashMap<Character, TrieNode> children;
    private String content;
    private boolean isWord;
    
   // ...
}

There may be cases when a trie is a binary search tree, but in general, these are different. Both binary search trees and tries are trees, but each node in binary search trees always has two children, whereas tries’ nodes, on the other hand, can have more.

可能有这样的情况:一个 trie 是一棵二进制搜索树,但一般来说,这些是不同的。二进制搜索树和tries都是树,但二进制搜索树的每个节点总是有两个子节点,而tries的节点则可以有更多的子节点。

In a trie, every node (except the root node) stores one character or a digit. By traversing the trie down from the root node to a particular node n, a common prefix of characters or digits can be formed which is shared by other branches of the trie as well.

在 trie 中,每个节点(根节点除外)都存储一个字符或一个数字。通过从根节点向下遍历三联体到一个特定的节点n,可以形成一个字符或数字的共同前缀,该前缀也被三联体的其他分支所共享。

By traversing up the trie from a leaf node to the root node, a String or a sequence of digits can be formed.

通过从叶子节点到根节点向上遍历三角形,可以形成一个String或一个数字序列。

Here is the Trie class, which represents an implementation of the trie data structure:

这里是Trie类,它代表了trie数据结构的一个实现。

public class Trie {
    private TrieNode root;
    //...
}

3. Common Operations

3.常见的操作

Now, let’s see how to implement basic operations.

现在,让我们看看如何实现基本操作。

3.1. Inserting Elements

3.1.插入元素

The first operation that we’ll describe is the insertion of new nodes.

我们要描述的第一个操作是插入新节点。

Before we start the implementation, it’s important to understand the algorithm:

在我们开始实施之前,了解该算法是很重要的。

  1. Set a current node as a root node
  2. Set the current letter as the first letter of the word
  3. If the current node has already an existing reference to the current letter (through one of the elements in the “children” field), then set current node to that referenced node. Otherwise, create a new node, set the letter equal to the current letter, and also initialize current node to this new node
  4. Repeat step 3 until the key is traversed

The complexity of this operation is O(n), where n represents the key size.

该操作的复杂度为O(n),其中n代表密钥大小。

Here is the implementation of this algorithm:

下面是这个算法的实现。

public void insert(String word) {
    TrieNode current = root;

    for (char l: word.toCharArray()) {
        current = current.getChildren().computeIfAbsent(l, c -> new TrieNode());
    }
    current.setEndOfWord(true);
}

Now let’s see how we can use this method to insert new elements in a trie:

现在让我们看看如何使用这个方法在一个 trie 中插入新元素。

private Trie createExampleTrie() {
    Trie trie = new Trie();

    trie.insert("Programming");
    trie.insert("is");
    trie.insert("a");
    trie.insert("way");
    trie.insert("of");
    trie.insert("life");

    return trie;
}

We can test that trie has already been populated with new nodes from the following test:

我们可以从下面的测试中检验trie是否已经被填充了新的节点。

@Test
public void givenATrie_WhenAddingElements_ThenTrieNotEmpty() {
    Trie trie = createTrie();

    assertFalse(trie.isEmpty());
}

3.2. Finding Elements

3.2.寻找元素

Let’s now add a method to check whether a particular element is already present in a trie:

现在让我们添加一个方法来检查一个特定的元素是否已经存在于一个 trie 中。

  1. Get children of the root
  2. Iterate through each character of the String
  3. Check whether that character is already a part of a sub-trie. If it isn’t present anywhere in the trie, then stop the search and return false
  4. Repeat the second and the third step until there isn’t any character left in the String. If the end of the String is reached, return true

The complexity of this algorithm is O(n), where n represents the length of the key.

该算法的复杂度为O(n),其中n代表密钥的长度。

Java implementation can look like:

Java的实现可以看起来像。

public boolean find(String word) {
    TrieNode current = root;
    for (int i = 0; i < word.length(); i++) {
        char ch = word.charAt(i);
        TrieNode node = current.getChildren().get(ch);
        if (node == null) {
            return false;
        }
        current = node;
    }
    return current.isEndOfWord();
}

And in action:

并在行动中。

@Test
public void givenATrie_WhenAddingElements_ThenTrieContainsThoseElements() {
    Trie trie = createExampleTrie();

    assertFalse(trie.containsNode("3"));
    assertFalse(trie.containsNode("vida"));
    assertTrue(trie.containsNode("life"));
}

3.3. Deleting an Element

3.3.删除一个元素

Aside from inserting and finding an element, it’s obvious that we also need to be able to delete elements.

除了插入和寻找一个元素外,很明显,我们还需要能够删除元素。

For the deletion process, we need to follow the steps:

对于删除过程,我们需要遵循以下步骤。

  1. Check whether this element is already part of the trie
  2. If the element is found, then remove it from the trie

The complexity of this algorithm is O(n), where n represents the length of the key.

这个算法的复杂度是O(n),其中n代表钥匙的长度。

Let’s have a quick look at the implementation:

让我们快速看一下实施情况。

public void delete(String word) {
    delete(root, word, 0);
}

private boolean delete(TrieNode current, String word, int index) {
    if (index == word.length()) {
        if (!current.isEndOfWord()) {
            return false;
        }
        current.setEndOfWord(false);
        return current.getChildren().isEmpty();
    }
    char ch = word.charAt(index);
    TrieNode node = current.getChildren().get(ch);
    if (node == null) {
        return false;
    }
    boolean shouldDeleteCurrentNode = delete(node, word, index + 1) && !node.isEndOfWord();

    if (shouldDeleteCurrentNode) {
        current.getChildren().remove(ch);
        return current.getChildren().isEmpty();
    }
    return false;
}

And in action:

并在行动中。

@Test
void whenDeletingElements_ThenTreeDoesNotContainThoseElements() {
    Trie trie = createTrie();

    assertTrue(trie.containsNode("Programming"));
 
    trie.delete("Programming");
    assertFalse(trie.containsNode("Programming"));
}

4. Conclusion

4.结论

In this article, we’ve seen a brief introduction to trie data structure and its most common operations and their implementation.

在这篇文章中,我们已经看到了对Trie数据结构的简单介绍,以及它最常见的操作和它们的实现。

The full source code for the examples shown in this article can be found over on GitHub.

本文所示示例的完整源代码可以在GitHub上找到over。