Batch Processing with Spring Cloud Data Flow – 用Spring Cloud数据流进行批量处理

最后修改: 2016年 9月 25日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

In the first article of the series, we introduced Spring Cloud Data Flow‘s architectural component and how to use it to create a streaming data pipeline.

在该系列的第一篇文章中,我们介绍了Spring Cloud Data Flow的架构组件以及如何使用它来创建流数据管道。

As opposed to a stream pipeline, where an unbounded amount of data is processed, a batch process makes it easy to create short-lived services where tasks are executed on demand.

与处理不受限制的数据量的流管线相比,批处理使创建短命的服务变得容易,在这种情况下,任务是按需执行的

2. Local Data Flow Server and Shell

2.本地数据流服务器和贝壳

The Local Data Flow Server is a component that is responsible for deploying applications, while the Data Flow Shell allows us to perform DSL commands needed for interacting with a server.

本地数据流服务器是一个负责部署应用程序的组件,而数据流外壳允许我们执行与服务器交互所需的DSL命令。

In the previous article, we used Spring Initilizr to set them both up as a Spring Boot Application.

上一篇文章中,我们使用Spring Initilizr将它们都设置为Spring Boot应用程序。

After adding the @EnableDataFlowServer annotation to the server’s main class and the @EnableDataFlowShell annotation to the shell’s main class respectively, they are ready to be launched by performing:

在将@EnableDataFlowServer注解分别添加到server的主类和@EnableDataFlowShell注解添加到shell的主类后,它们就可以通过执行来启动。

mvn spring-boot:run

The server will boot up on port 9393 and a shell will be ready to interact with it from the prompt.

服务器将在端口9393上启动,一个shell将准备好从提示符上与之互动。

You can refer to the previous article for the details on how to obtain and use a Local Data Flow Server and its shell client.

关于如何获得和使用本地数据流服务器及其shell客户端的细节,你可以参考之前的文章。

3. The Batch Application

3.批处理程序

As with the server and the shell, we can use Spring Initilizr to set up a root Spring Boot batch application.

与服务器和shell一样,我们可以使用Spring Initilizr来设置一个根Spring Boot批处理应用程序。

After reaching the website, simply choose a Group, an Artifact name and select Cloud Task from the dependencies search box.

到达网站后,只需选择一个,一个工件名称,并从依赖性搜索框中选择云任务

Once this is done, click on the Generate Project button to start downloading the Maven artifact.

完成后,点击Generate Project按钮,开始下载Maven构件。

The artifact comes preconfigured and with basic code. Let’s see how to edit it in order to build our batch application.

这个工件是预先配置好的,并且有基本的代码。让我们看看如何编辑它,以建立我们的批处理应用程序。

3.1. Maven Dependencies

3.1.Maven的依赖性

First of all, let’s add a couple of Maven dependencies. As this is a batch application, we need to import libraries from the Spring Batch Project:

首先,让我们添加几个Maven的依赖项。由于这是一个批处理程序,我们需要从Spring批处理项目导入库。

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-batch</artifactId>
</dependency>

Also, as the Spring Cloud Task uses a relational database to store results of an executed task, we need to add a dependency to an RDBMS driver:

另外,由于Spring Cloud Task使用关系型数据库来存储执行任务的结果,我们需要添加对RDBMS驱动的依赖。

<dependency>
    <groupId>com.h2database</groupId>
    <artifactId>h2</artifactId>
</dependency>

We’ve chosen to use the H2 in-memory database provided by Spring. This gives us a simple method of bootstrapping development. However, in a production environment, you’ll want to configure your own DataSource.

我们选择使用Spring提供的H2内存数据库。这给了我们一个简单的开发引导方法。然而,在生产环境中,你会想要配置你自己的DataSource

Keep in mind that artifacts’ versions will be inherited from Spring Boot’s parent pom.xml file.

请记住,工件的版本将从Spring Boot的父pom.xml文件继承。

3.2. Main Class

3.2.主类

The key point to enabling desired functionality would be to add the @EnableTask and @EnableBatchProcessing annotations to the Spring Boot’s main class. This class level annotation tells Spring Cloud Task to bootstrap everything:

启用所需功能的关键点是将@EnableTask@EnableBatchProcessing注解添加到Spring Boot的主类。这个类级别的注解告诉Spring Cloud Task要启动一切。

@EnableTask
@EnableBatchProcessing
@SpringBootApplication
public class BatchJobApplication {

    public static void main(String[] args) {
        SpringApplication.run(BatchJobApplication.class, args);
    }
}

3.3. Job Configuration

3.3.工作配置

Lastly, let’s configure a job – in this case a simple print of a String to a log file:

最后,让我们配置一个作业–在本例中是简单地打印一个String到一个日志文件。

@Configuration
public class JobConfiguration {

    private static Log logger
      = LogFactory.getLog(JobConfiguration.class);

    @Autowired
    public JobBuilderFactory jobBuilderFactory;

    @Autowired
    public StepBuilderFactory stepBuilderFactory;

    @Bean
    public Job job() {
        return jobBuilderFactory.get("job")
          .start(stepBuilderFactory.get("jobStep1")
          .tasklet(new Tasklet() {
            
              @Override
              public RepeatStatus execute(StepContribution contribution, 
                ChunkContext chunkContext) throws Exception {
                
                logger.info("Job was run");
                return RepeatStatus.FINISHED;
              }
        }).build()).build();
    }
}

Details on how to configure and define a job are outside the scope of this article. For more information, you can see our Introduction to Spring Batch article.

D关于如何配置和定义作业的细节超出了本文的范围。更多信息,您可以参阅我们的Introduction to Spring Batch文章。

Finally, our application is ready. Let’s install it inside our local Maven repository. To do this cd into the project’s root directory and issue the command:

最后,我们的应用程序已经准备好了。让我们把它安装到本地Maven仓库。要做到这一点,cd进入项目的根目录,然后发出命令。

mvn clean install

Now it’s time to put the application inside the Data Flow Server.

现在,是时候把应用程序放在数据流服务器内了

4. Registering the Application

4.注册应用程序

To register the application within the App Registry we need to provide a unique name, an application type, and a URI that can be resolved to the app artifact.

为了在应用注册中心注册应用程序,我们需要提供一个唯一的名称、一个应用程序类型和一个可以解析到应用程序工件的URI。

Go to the Spring Cloud Data Flow Shell and issue the command from the prompt:

进入Spring Cloud Data Flow Shell,在提示符下发出命令。

app register --name batch-job --type task 
  --uri maven://com.baeldung.spring.cloud:batch-job:jar:0.0.1-SNAPSHOT

5. Creating a Task

5.创建一个任务

A task definition can be created using the command:

可以用命令创建一个任务定义。

task create myjob --definition batch-job

This creates a new task with the name myjob pointing to the previously registeredbatch-job application .

这将创建一个名称为myjob的新任务,指向之前注册的batch-job应用程序。

A listing of the current task definitions can be obtained using the command:

使用该命令可以获得当前任务定义的清单。

task list

6. Launching a Task

6.启动一个任务

To launch a task we can use the command:

要启动一个任务,我们可以使用命令。

task launch myjob

Once the task is launched the state of the task is stored in a relational DB. We can check the status of our task executions with the command:

一旦任务被启动,任务的状态就会被存储在一个关系数据库中。我们可以用命令来检查我们的任务执行的状态。

task execution list

7. Reviewing the Result

7.审查结果

In this example, the job simply prints a string in a log file. The log files are located within the directory displayed in the Data Flow Server’s log output.

在这个例子中,作业只是在日志文件中打印了一个字符串。日志文件位于数据流服务器的日志输出中显示的目录内。

To see the result we can tail the log:

为了看清结果,我们可以尾随日志。

tail -f PATH_TO_LOG\spring-cloud-dataflow-2385233467298102321\myjob-1472827120414\myjob
[...] --- [main] o.s.batch.core.job.SimpleStepHandler: Executing step: [jobStep1]
[...] --- [main] o.b.spring.cloud.JobConfiguration: Job was run
[...] --- [main] o.s.b.c.l.support.SimpleJobLauncher:
  Job: [SimpleJob: [name=job]] completed with the following parameters: 
    [{}] and the following status: [COMPLETED]

8. Conclusion

8.结论

In this article, we have shown how to deal with batch processing through the use of Spring Cloud Data Flow.

在这篇文章中,我们展示了如何通过使用Spring Cloud Data Flow来处理批处理问题。

The example code can be found in the GitHub project.

示例代码可以在GitHub项目中找到。