Java-R Integration – Java-R整合

最后修改: 2020年 4月 27日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

R is a popular programming language used for statistics. Since it has a wide variety of functions and packages available, it’s not an uncommon requirement to embed R code into other languages.

R是一种用于统计的流行编程语言。由于它有各种各样的功能和软件包可用,因此将R代码嵌入其他语言中的要求并不罕见。

In this article, we’ll take a look at some of the most common ways of integrating R code into Java.

在这篇文章中,我们将看一下将R代码集成到Java中的一些最常见的方法。

2. R Script

2.R 脚本

For our project, we’ll start by implementing a very simple R function that takes a vector as input and returns the mean of its values. We’ll define this in a dedicated file:

在我们的项目中,我们将从实现一个非常简单的R函数开始,该函数接收一个向量作为输入,并返回其数值的平均值。我们将在一个专门的文件中定义它。

customMean <- function(vector) {
    mean(vector)
}

Throughout this tutorial, we’ll use a Java helper method to read this file and return its content as a String:

在本教程中,我们将使用一个Java辅助方法来读取这个文件并将其内容作为String返回。

String getMeanScriptContent() throws IOException, URISyntaxException {
    URI rScriptUri = RUtils.class.getClassLoader().getResource("script.R").toURI();
    Path inputScript = Paths.get(rScriptUri);
    return Files.lines(inputScript).collect(Collectors.joining());
}

Now, let’s take a look at the different options we have to invoke this function from Java.

现在,让我们来看看我们有哪些不同的选择来从Java中调用这个函数。

3. RCaller

3、RCaller

The first library we’re going to consider is RCaller which can execute code by spawning a dedicated R process on the local machine.

我们要考虑的第一个库是RCaller,它可以通过在本地机器上催生一个专门的R进程来执行代码。

Since RCaller is available from Maven Central, we can just include it in our pom.xml:

由于RCaller可从Maven中心获得,我们只需在pom.xml中包含它。

<dependency>
    <groupId>com.github.jbytecode</groupId>
    <artifactId>RCaller</artifactId>
    <version>3.0</version>
</dependency>

Next, let’s write a custom method which returns the mean of our values by using our original R script:

接下来,让我们写一个自定义方法,通过使用我们的原始R脚本来返回我们的值的平均值。

public double mean(int[] values) throws IOException, URISyntaxException {
    String fileContent = RUtils.getMeanScriptContent();
    RCode code = RCode.create();
    code.addRCode(fileContent);
    code.addIntArray("input", values);
    code.addRCode("result <- customMean(input)");
    RCaller caller = RCaller.create(code, RCallerOptions.create());
    caller.runAndReturnResult("result");
    return caller.getParser().getAsDoubleArray("result")[0];
}

In this method we’re mainly using two objects:

在这个方法中,我们主要使用两个对象。

  • RCode, which represents our code context, including our function, its input, and an invocation statement
  • RCaller, which lets us run our code and get the result back

It’s important to notice that RCaller is not suitable for small and frequent computations because of the time it takes to start the R process. This is a noticeable drawback.

需要注意的是,RCaller不适合小型和频繁的计算,因为它需要花费时间来启动R程序。这是一个明显的缺点。

Also, RCaller works only with R installed on the local machine.

另外,RCaller只在本地机器上安装了R的情况下工作

4. Renjin

4.人进

Renjin is another popular solution available on the R integration landscape. It’s more widely adopted, and it also offers enterprise support.

Renjin是R集成领域的另一个流行解决方案。它被更广泛地采用,而且它还提供企业支持

Adding Renjin to our project is a bit less trivial since we have to add the Mulesoft repository along with the Maven dependency:

将Renjin添加到我们的项目中就不那么简单了,因为我们必须将Mulesoft资源库与Maven依赖性一起添加。

<repositories>
    <repository>
        <id>mulesoft</id>
        <name>Mulesoft Repository</name>
        <url>https://repository.mulesoft.org/nexus/content/repositories/public/</url>
    </repository>
</repositories>

<dependencies>
    <dependency>
        <groupId>org.renjin</groupId>
        <artifactId>renjin-script-engine</artifactId>
        <version>RELEASE</version>
    </dependency>
</dependencies>

Once again, let’s build a Java wrapper to our R function:

再一次,让我们为我们的R函数建立一个Java包装器。

public double mean(int[] values) throws IOException, URISyntaxException, ScriptException {
    RenjinScriptEngine engine = new RenjinScriptEngine();
    String meanScriptContent = RUtils.getMeanScriptContent();
    engine.put("input", values);
    engine.eval(meanScriptContent);
    DoubleArrayVector result = (DoubleArrayVector) engine.eval("customMean(input)");
    return result.asReal();
}

As we can see, the concept is very similar to RCaller, although being less verbose, since we can invoke functions directly by name using the eval method.

正如我们所看到的,这个概念与RCaller非常相似,尽管不那么冗长,因为我们可以使用eval方法直接按名称调用函数。

The main advantage of Renjin is that it doesn’t require an R installation as it uses a JVM-based interpreter. However, Renjin is currently not 100% compatible with GNU R.

Renjin的主要优势是它不需要安装R,因为它使用了基于JVM的解释器。然而,Renjin目前与GNU R不是100%兼容。

5. Rserve

5.服务

The libraries we have reviewed so far are good choices for running code locally. But what if we want to have multiple clients invoking our R script? That’s where Rserve comes into play, letting us run R code on a remote machine through a TCP server.

到目前为止,我们所回顾的库是在本地运行代码的好选择。但是如果我们想让多个客户端调用我们的R脚本呢?这就是Rserve发挥作用的地方,使我们能够通过TCP服务器在远程机器上运行R代码

Setting up Rserve involves installing the related package and starting the server loading our script, through the R console:

设置Rserve包括安装相关的软件包,并通过R控制台启动服务器加载我们的脚本。

> install.packages("Rserve")
...
> library("Rserve")
> Rserve(args = "--RS-source ~/script.R")
Starting Rserve...

Next, we can now include Rserve in our project by, as usual, adding the Maven dependency:

接下来,我们可以像往常一样,通过添加Maven依赖项,将Rserve纳入我们的项目中。

<dependency>
    <groupId>org.rosuda.REngine</groupId>
    <artifactId>Rserve</artifactId>
    <version>1.8.1</version>
</dependency>

Finally, let’s wrap our R script into a Java method. Here we’ll use an RConnection object with our server address, defaulting to 127.0.0.1:6311 if not provided:

最后,让我们把我们的R脚本包装成一个Java方法。在这里,我们将使用一个带有服务器地址的RConnection对象,如果没有提供,则默认为127.0.0.1:6311。

public double mean(int[] values) throws REngineException, REXPMismatchException {
    RConnection c = new RConnection();
    c.assign("input", values);
    return c.eval("customMean(input)").asDouble();
}

6. FastR

6.fastR

The last library we’re going to talk about is FastR. a high-performance R implementation built on GraalVM. At the time of this writing, FastR is only available on Linux and Darwin x64 systems.

我们要讨论的最后一个库是FastR。这是一个建立在GraalVM上的高性能R实现。在撰写本文时,FastR仅适用于Linux和Darwin x64系统

In order to use it, we first need to install GraalVM from the official website. After that, we need to install FastR itself using the Graal Component Updater and then run the configuration script that comes with it:

为了使用它,我们首先需要从官方网站上安装GraalVM。之后,我们需要使用Graal Component Updater安装FastR本身,然后运行它附带的配置脚本。

$ bin/gu install R
...
$ languages/R/bin/configure_fastr

This time our code will depend on Polyglot, the GraalVM internal API for embedding different guest languages in Java. Since Polyglot is a general API, we specify the language of the code we want to run. Also, we’ll use the c R function to convert our input to a vector:

这次我们的代码将依赖于Polyglot,这是GraalVM内部的API,用于在Java中嵌入不同的客户语言。由于Polyglot是一个通用的API,我们指定我们要运行的代码的语言。另外,我们将使用cR函数来将我们的输入转换为矢量。

public double mean(int[] values) {
    Context polyglot = Context.newBuilder().allowAllAccess(true).build();
    String meanScriptContent = RUtils.getMeanScriptContent(); 
    polyglot.eval("R", meanScriptContent);
    Value rBindings = polyglot.getBindings("R");
    Value rInput = rBindings.getMember("c").execute(values);
    return rBindings.getMember("customMean").execute(rInput).asDouble();
}

When following this approach, keep in mind that it makes our code tightly coupled with the JVM. To learn more about GraalVM check out our article on the Graal Java JIT Compiler.

当采用这种方法时,请记住,它使我们的代码与JVM紧密耦合。要了解有关 GraalVM 的更多信息,请查看我们关于Graal Java JIT Compiler的文章。

7. Conclusion

7.结语

In this article, we went through some of the most popular technologies for integrating R in Java. To sum up:

在这篇文章中,我们经历了一些在Java中集成R的最流行的技术。总结一下。

  • RCaller is easier to integrate since it’s available on Maven Central
  • Renjin offers enterprise support and doesn’t require R to be installed on the local machine but it’s not 100% compatible with GNU R
  • Rserve can be used to execute R code on a remote server
  • FastR allows seamless integration with Java but makes our code dependent on the VM and is not available for every OS

As always, all the code used in this tutorial is available over on GitHub.

一如既往,本教程中使用的所有代码都可以在GitHub上找到