Validating Input with Finite Automata in Java – 用Java中的有限自动机验证输入

最后修改: 2017年 3月 28日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

If you’ve studied CS, you’ve undoubtedly taken a course about compilers or something similar; in these classes, the concept of Finite Automaton (also known as Finite State Machine) is taught. This is a way of formalizing the grammar rules of languages.

如果你学过CS,你无疑上过一门关于编译器或类似的课程;在这些课程中,有限自动机(也称为有限状态机)的概念被教授。这是一种将语言的语法规则形式化的方法。

You can read more about the subject here and here.

你可以在这里阅读更多关于这个主题的信息。

So how can this forgotten concept be helpful to us, high-level programmers, who don’t need to worry about building a new compiler?

那么,这个被遗忘的概念对我们这些不需要担心建立新的编译器的高级程序员有什么帮助呢?

Well, it turns out that the concept can simplify a lot of business scenarios and give us the tools to reason about complex logic.

好吧,事实证明,这个概念可以简化很多业务场景,并给我们提供了推理复杂逻辑的工具。

As a quick example, we can also validate input without an external, third-party library.

作为一个快速的例子,我们也可以在没有外部第三方库的情况下验证输入。

2. The Algorithm

2.算法

In a nutshell, such a machine declares states and ways of getting from one state to another. If you put a stream through it, you can validate its format with the following algorithm (pseudocode):

简而言之,这样的机器声明了状态和从一个状态到另一个状态的方法。如果你把一个流通过它,你可以用下面的算法(伪代码)来验证其格式。

for (char c in input) {
    if (automaton.accepts(c)) {
        automaton.switchState(c);
        input.pop(c);
    } else {
        break;
    }
}
if (automaton.canStop() && input.isEmpty()) {
    print("Valid");
} else {
    print("Invalid");
}

We say the automaton “accepts” the given char if there is any arrow going from the current state, which has the char on it. Switching states means that a pointer is followed and the current state is replaced with the state that the arrow points to.

我们说自动机 “接受 “给定的字符,如果有任何从当前状态出发的箭头,上面有该字符。切换状态意味着指针被跟踪,当前状态被箭头所指向的状态所取代。

Finally, when the loop is over, we check if the automaton “can stop” (the current state is double-circled) and that input has been exhausted.

最后,当循环结束时,我们检查自动机是否 “可以停止”(当前状态是双圈的),并且输入已经用尽。

3. An Example

3.一个例子

Let’s write a simple validator for a JSON object, to see the algorithm in action. Here is the automaton that accepts an object:

让我们为JSON对象写一个简单的验证器,来看看这个算法的作用。这里是接受一个对象的自动机。

json dfa 2

Note that value can be one of the following: string, integer, boolean, null or another JSON object. For the sake of brevity, in our example, we’ll consider only strings.

请注意,值可以是以下之一:字符串、整数、布尔值、空或另一个JSON对象。为了简洁起见,在我们的例子中,我们只考虑字符串。

3.1. The Code

3.1.准则》

Implementing a finite state machine is quite straightforward. We have the following:

实现一个有限的状态机是非常简单的。我们有以下内容。

public interface FiniteStateMachine {
    FiniteStateMachine switchState(CharSequence c);
    boolean canStop();
}
 
interface State {
    State with(Transition tr);
    State transit(CharSequence c);
    boolean isFinal();
}
 
interface Transition {
    boolean isPossible(CharSequence c);
    State state();
}

The relations between them are:

它们之间的关系是。

  • The state machine has one current State and tells us if it can stop or not (if the state is final or not)
  • A State has a list of Transitions which could be followed (outgoing arrows)
  • A Transition tells us if the character is accepted and gives us the next State
publi class RtFiniteStateMachine implements FiniteStateMachine {

    private State current;

    public RtFiniteStateMachine(State initial) {
        this.current = initial;
    }

    public FiniteStateMachine switchState(CharSequence c) {
        return new RtFiniteStateMachine(this.current.transit(c));
    }

    public boolean canStop() {
        return this.current.isFinal();
    }
}

Note that the FiniteStateMachine implementation is immutable. This is mainly so a single instance of it can be used multiple times.

请注意,FiniteStateMachine实现是不可变的。这主要是为了让它的一个实例可以被多次使用。

Following, we have the implementation RtState. The with(Transition) method returns the instance after the transition is added, for fluency. A State also tells us if it’s final (double-circled) or not.

接下来,我们有一个实现RtStatewith(Transition)方法在添加了过渡后返回实例,以保证流畅性。一个State还告诉我们它是否是最终的(双圈)。

public class RtState implements State {

    private List<Transition> transitions;
    private boolean isFinal;

    public RtState() {
        this(false);
    }
    
    public RtState(boolean isFinal) {
        this.transitions = new ArrayList<>();
        this.isFinal = isFinal;
    }

    public State transit(CharSequence c) {
        return transitions
          .stream()
          .filter(t -> t.isPossible(c))
          .map(Transition::state)
          .findAny()
          .orElseThrow(() -> new IllegalArgumentException("Input not accepted: " + c));
    }

    public boolean isFinal() {
        return this.isFinal;
    }

    @Override
    public State with(Transition tr) {
        this.transitions.add(tr);
        return this;
    }
}

And finally, RtTransition which checks the transition rule and can give the next State:

最后,RtTransition检查过渡规则,并能给出下一个State

public class RtTransition implements Transition {

    private String rule;
    private State next;
    public State state() {
        return this.next;
    }

    public boolean isPossible(CharSequence c) {
        return this.rule.equalsIgnoreCase(String.valueOf(c));
    }

    // standard constructors
}

With this implementation, you should be able to build any state machine. The algorithm described at the beginning is as straightforward as:

有了这个实现,你应该能够建立任何状态机。一开始描述的算法就很直接了当。

String json = "{\"key\":\"value\"}";
FiniteStateMachine machine = this.buildJsonStateMachine();
for (int i = 0; i < json.length(); i++) {
    machine = machine.switchState(String.valueOf(json.charAt(i)));
}
 
assertTrue(machine.canStop());

Check the test class RtFiniteStateMachineTest to see the buildJsonStateMachine() method. Note that it adds a few more states than the image above, to also catch the quotes that surround the Strings properly.

查看测试类RtFiniteStateMachineTest,看看buildJsonStateMachine()方法。请注意,它比上面的图片多加了一些状态,以正确捕捉环绕字符串的引号。

4. Conclusion

4.总结

Finite automata are great tools that you can use in validating structured data.

有限自动机是很好的工具,你可以在验证结构化数据时使用。

However, they’re not widely known because they can get complicated when it comes to complex input (since a transition can be used for only one character). Nevertheless, they are great when it comes to checking a simple set of rules.

然而,它们并不广为人知,因为当涉及到复杂的输入时,它们会变得很复杂(因为一个过渡可以只用于一个字符)。尽管如此,当涉及到检查一套简单的规则时,它们是很好的。

Finally, if you want to do some more complicated work using finite state machines, StatefulJ and squirrel are two libraries worth looking into.

最后,如果你想使用有限状态机做一些更复杂的工作,StatefulJsquirrel是两个值得研究的库。

You can check code samples on GitHub.

你可以在GitHub上查看代码样本