Java with ANTLR – 使用ANTLR的Java

最后修改: 2018年 6月 26日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

In this tutorial, we’ll do a quick overview of the ANTLR parser generator and show some real-world applications.

在本教程中,我们将对ANTLR解析器生成器做一个快速概述,并展示一些实际应用。

2. ANTLR

2.ANTLR

ANTLR (ANother Tool for Language Recognition) is a tool for processing structured text.

ANTLR (ANother Tool for Language Recognition)是一个处理结构化文本的工具。

It does this by giving us access to language processing primitives like lexers, grammars, and parsers as well as the runtime to process text against them.

它通过让我们访问语言处理基元,如词法、语法和解析器,以及针对它们处理文本的运行时间来做到这一点。

It’s often used to build tools and frameworks. For example, Hibernate uses ANTLR for parsing and processing HQL queries and Elasticsearch uses it for Painless.

它经常被用来构建工具和框架。例如,Hibernate使用ANTLR来解析和处理HQL查询,Elasticsearch使用它来处理Painless。

And Java is just one binding. ANTLR also offers bindings for C#, Python, JavaScript, Go, C++ and Swift.

而Java仅仅是一个绑定。ANTLR还为C#、Python、JavaScript、Go、C++和Swift提供绑定。

3. Configuration

3.配置

First of all, let’s start by adding antlr-runtime to our pom.xml:

首先,让我们先把antlr-runtime添加到我们的pom.xml

<dependency>
    <groupId>org.antlr</groupId>
    <artifactId>antlr4-runtime</artifactId>
    <version>4.7.1</version>
</dependency>

And also the antlr-maven-plugin:

还有antlr-maven-plugin

<plugin>
    <groupId>org.antlr</groupId>
    <artifactId>antlr4-maven-plugin</artifactId>
    <version>4.7.1</version>
    <executions>
        <execution>
            <goals>
                <goal>antlr4</goal>
            </goals>
        </execution>
    </executions>
</plugin>

It’s the plugin’s job to generate code from the grammars we specify.

该插件的工作是根据我们指定的语法生成代码。

4. How Does It Work?

4.它是如何工作的?

Basically, when we want to create the parser by using the ANTLR Maven plugin, we need to follow three simple steps:

基本上,当我们想通过使用ANTLR Maven插件来创建解析器时,我们需要遵循三个简单的步骤。

  • prepare a grammar file
  • generate sources
  • create the listener

So, let’s see these steps in action.

因此,让我们看看这些步骤的行动。

5. Using an Existing Grammar

5.使用现有的语法

Let’s first use ANTLR to analyze code for methods with bad casing:

让我们先用ANTLR来分析代码中是否有不好的套管方法。

public class SampleClass {
 
    public void DoSomethingElse() {
        //...
    }
}

Simply put, we’ll validate that all method names in our code start with a lowercase letter.

简单地说,我们将验证我们的代码中所有的方法名称是否以小写字母开头。

5.1. Prepare a Grammar File

5.1.准备一个语法文件

What’s nice is that there are already several grammar files out there that can suit our purposes.

不错的是,现在已经有几个语法文件可以满足我们的目的。

Let’s use the Java8.g4 grammar file which we found in ANTLR’s Github grammar repo.

让我们使用Java8.g4语法文件,它是我们在ANTLR的Github语法 repo中发现的。

We can create the src/main/antlr4 directory and download it there.

我们可以创建src/main/antlr4目录并在那里下载。

5.2. Generate Sources

5.2.生成来源

ANTLR works by generating Java code corresponding to the grammar files that we give it, and the maven plugin makes it easy:

ANTLR的工作方式是生成与我们给它的语法文件相对应的Java代码,而maven插件使之变得简单。

mvn package

By default, this will generate several files under the target/generated-sources/antlr4 directory:

默认情况下,这将在target/generated-sources/antlr4目录下生成几个文件。

  • Java8.interp
  • Java8Listener.java
  • Java8BaseListener.java
  • Java8Lexer.java
  • Java8Lexer.interp
  • Java8Parser.java
  • Java8.tokens
  • Java8Lexer.tokens

Notice that the names of those files are based on the name of the grammar file.

注意,这些文件的名称是基于语法文件的名称

We’ll need the Java8Lexer and the Java8Parser files later when we test. For now, though, we need the Java8BaseListener for creating our MethodUppercaseListener.

我们将在以后的测试中需要Java8LexerJava8Parser文件。不过现在,我们需要Java8BaseListener来创建我们的MethodUppercaseListener

5.3. Creating MethodUppercaseListener

5.3.创建MethodUppercaseListener

Based on the Java8 grammar that we used, Java8BaseListener has several methods that we can override, each one corresponding to a heading in the grammar file.

基于我们使用的Java8语法,Java8BaseListener有几个我们可以覆盖的方法,每个方法都对应着语法文件中的一个标题。

For example, the grammar defines the method name, parameter list, and throws clause like so:

例如,该语法是这样定义方法名称、参数列表和throws子句的。

methodDeclarator
	:	Identifier '(' formalParameterList? ')' dims?
	;

And so Java8BaseListener has a method enterMethodDeclarator which will be invoked each time this pattern is encountered.

因此,Java8BaseListener有一个方法enterMethodDeclarator,每次遇到这种模式都会被调用。

So, let’s override enterMethodDeclarator, pull out the Identifier, and perform our check:

所以,让我们覆盖enterMethodDeclarator,拉出Identifier,并执行我们的检查。

public class UppercaseMethodListener extends Java8BaseListener {

    private List<String> errors = new ArrayList<>();

    // ... getter for errors
 
    @Override
    public void enterMethodDeclarator(Java8Parser.MethodDeclaratorContext ctx) {
        TerminalNode node = ctx.Identifier();
        String methodName = node.getText();

        if (Character.isUpperCase(methodName.charAt(0))) {
            String error = String.format("Method %s is uppercased!", methodName);
            errors.add(error);
        }
    }
}

5.4. Testing

5.4.测试

Now, let’s do some testing. First, we construct the lexer:

现在,让我们做一些测试。首先,我们构建词法器。

String javaClassContent = "public class SampleClass { void DoSomething(){} }";
Java8Lexer java8Lexer = new Java8Lexer(CharStreams.fromString(javaClassContent));

Then, we instantiate the parser:

然后,我们将解析器实例化。

CommonTokenStream tokens = new CommonTokenStream(lexer);
Java8Parser parser = new Java8Parser(tokens);
ParseTree tree = parser.compilationUnit();

And then, the walker and the listener:

然后,步行者和听众。

ParseTreeWalker walker = new ParseTreeWalker();
UppercaseMethodListener listener= new UppercaseMethodListener();

Lastly, we tell ANTLR to walk through our sample class:

最后,我们告诉ANTLR走一遍我们的样本类

walker.walk(listener, tree);

assertThat(listener.getErrors().size(), is(1));
assertThat(listener.getErrors().get(0),
  is("Method DoSomething is uppercased!"));

6. Building Our Grammar

6.建立我们的语法

Now, let’s try something just a little bit more complex, like parsing log files:

现在,让我们试试更复杂一点的东西,比如解析日志文件。

2018-May-05 14:20:18 INFO some error occurred
2018-May-05 14:20:19 INFO yet another error
2018-May-05 14:20:20 INFO some method started
2018-May-05 14:20:21 DEBUG another method started
2018-May-05 14:20:21 DEBUG entering awesome method
2018-May-05 14:20:24 ERROR Bad thing happened

Because we have a custom log format, we’re going to first need to create our own grammar.

因为我们有一个自定义的日志格式,所以我们首先需要创建我们自己的语法。

6.1. Prepare a Grammar File

6.1.准备一个语法文件

First, let’s see if we can create a mental map of what each log line looks like in our file.

首先,让我们看看我们是否可以创建一个心理地图,看看每个日志行在我们的文件中是什么样子。

<datetime> <level> <message>

<datetime> <level> <message>

Or if we go one more level deep, we might say:

或者如果我们再深入一层,我们可以说。

<datetime> := <year><dash><month><dash><day> …

<datetime> :=<年><破折号><月><破折号><日> …

And so on. It’s important to consider this so we can decide at what level of granularity we want to parse the text.

以此类推。考虑这一点很重要,这样我们就可以决定我们要在什么级别的颗粒度上解析文本。

A grammar file is basically a set of lexer and parser rules. Simply put, lexer rules describe the syntax of the grammar while parser rules describe the semantics.

语法文件基本上是一组词汇器和分析器规则。简单地说,词法规则描述了语法的语法,而解析器规则描述了语义。

Let’s start by defining fragments which are reusable building blocks for lexer rules.

让我们从定义片段开始,这些片段是词典规则的可重复使用的构件。

fragment DIGIT : [0-9];
fragment TWODIGIT : DIGIT DIGIT;
fragment LETTER : [A-Za-z];

Next, let’s define the remainings lexer rules:

接下来,让我们来定义剩余的词库规则。

DATE : TWODIGIT TWODIGIT '-' LETTER LETTER LETTER '-' TWODIGIT;
TIME : TWODIGIT ':' TWODIGIT ':' TWODIGIT;
TEXT   : LETTER+ ;
CRLF : '\r'? '\n' | '\r';

With these building blocks in place, we can build parser rules for the basic structure:

有了这些构件,我们就可以为基本结构建立解析器规则。

log : entry+;
entry : timestamp ' ' level ' ' message CRLF;

And then we’ll add the details for timestamp:

然后,我们将添加时间戳的细节。

timestamp : DATE ' ' TIME;

For level:

对于级别

level : 'ERROR' | 'INFO' | 'DEBUG';

And for message:

而对于message

message : (TEXT | ' ')+;

And that’s it! Our grammar is ready to use. We will put it under the src/main/antlr4 directory as before.

就这样了!我们的语法已经可以使用了。我们将像以前一样把它放在src/main/antlr4目录下。

6.2. Generate Sources

6.2. 生成源

Recall that this is just a quick mvn package, and that this will create several files like LogBaseListenerLogParser, and so on, based on the name of our grammar.

回顾一下,这只是一个快速的mvn包,这将创建几个文件,如LogBaseListenerLogParser,等等,基于我们的语法名称。

6.3. Create Our Log Listener

6.3.创建我们的日志监听器

Now, we are ready to implement our listener, which we’ll ultimately use to parse a log file into Java objects.

现在,我们已经准备好实现我们的监听器了,我们最终会用它来将一个日志文件解析成Java对象。

So, let’s start with a simple model class for the log entry:

所以,让我们从一个简单的日志条目模型类开始。

public class LogEntry {

    private LogLevel level;
    private String message;
    private LocalDateTime timestamp;
   
    // getters and setters
}

Now, we need to subclass LogBaseListener as before:

现在,我们需要像以前一样子类化LogBaseListener

public class LogListener extends LogBaseListener {

    private List<LogEntry> entries = new ArrayList<>();
    private LogEntry current;

current will hold onto the current log line, which we can reinitialize each time we enter a logEntry, again based on our grammar:

current将保留当前的日志行,我们可以在每次输入logEntry时重新初始化它,再次基于我们的语法。

    @Override
    public void enterEntry(LogParser.EntryContext ctx) {
        this.current = new LogEntry();
    }

Next, we’ll use enterTimestampenterLevel, and enterMessage for setting the appropriate LogEntry properties:

接下来,我们将使用enterTimestampenterLevel,enterMessage来设置适当的LogEntry属性。

    @Override
    public void enterTimestamp(LogParser.TimestampContext ctx) {
        this.current.setTimestamp(
          LocalDateTime.parse(ctx.getText(), DEFAULT_DATETIME_FORMATTER));
    }
    
    @Override
    public void enterMessage(LogParser.MessageContext ctx) {
        this.current.setMessage(ctx.getText());
    }

    @Override
    public void enterLevel(LogParser.LevelContext ctx) {
        this.current.setLevel(LogLevel.valueOf(ctx.getText()));
    }

And finally, let’s use the exitEntry method in order to create and add our new LogEntry:

最后,让我们使用exitEntry方法,以创建和添加我们的新LogEntry

    @Override
    public void exitLogEntry(LogParser.EntryContext ctx) {
        this.entries.add(this.current);
    }

Note, by the way, that our LogListener isn’t threadsafe!

顺便一提,我们的LogListener并不是线程安全的!注意。

6.4. Testing

6.4.测试

And now we can test again as we did last time:

现在我们可以像上次那样再次进行测试。

@Test
public void whenLogContainsOneErrorLogEntry_thenOneErrorIsReturned()
  throws Exception {
 
    String logLine ="2018-May-05 14:20:24 ERROR Bad thing happened";

    // instantiate the lexer, the parser, and the walker
    LogListener listener = new LogListener();
    walker.walk(listener, logParser.log());
    LogEntry entry = listener.getEntries().get(0);
 
    assertThat(entry.getLevel(), is(LogLevel.ERROR));
    assertThat(entry.getMessage(), is("Bad thing happened"));
    assertThat(entry.getTimestamp(), is(LocalDateTime.of(2018,5,5,14,20,24)));
}

7. Conclusion

7.结论

In this article, we focused on how to create the custom parser for the own language using the ANTLR.

在这篇文章中,我们重点讨论了如何使用ANTLR为自己的语言创建自定义解析器。

We also saw how to use existing grammar files and apply them for very simple tasks like code linting.

我们还看到了如何使用现有的语法文件,并将其应用于非常简单的任务,如代码提示。

As always, all the code used here can be found over on GitHub.

像往常一样,这里使用的所有代码都可以在GitHub上找到