1. Overview
1.概述
In this tutorial, we’ll learn how to approach the problem of generating mock data for different purposes. We’ll learn how to use Datafaker and review several examples.
在本教程中,我们将学习如何处理为不同目的生成模拟数据的问题。我们将学习如何使用Datafaker并回顾几个例子。
2. History
2.历史
Datafaker is the modern fork for Javafaker. It was transferred to Java 8 and underwent improvements, increasing the library’s performance. However, the current API stayed more or less the same. Thus, previously used Javafaker won’t have any problems migrating to Datafaker. All the examples provided in the Java Faker article will work for the 1.6.0 version of Datafaker.
Datafaker是Javafaker的现代分叉。它被转移到了Java 8并进行了改进,提高了该库的性能。然而,当前的API或多或少保持不变。因此,以前使用的Javafaker不会有任何问题,可以迁移到Datafaker。在Java Faker文章中提供的所有例子将适用于Datafaker的1.6.0版本。
The current Datafaker API is compatible with Javafaker. Therefore, this article will concentrate only on the differences and the improvements.
当前的Datafaker API与Javafaker兼容。因此,这篇文章将只集中在差异和改进上。
First, let’s add the datafaker Maven dependency to the project:
首先,让我们把datafaker Maven依赖性添加到项目中。
<dependency>
<groupId>net.datafaker</groupId>
<artifactId>datafaker</artifactId>
<version>1.6.0</version>
</dependency>
3. Providers
3.提供者
One of the most important parts of Datafaker is the providers. This is a set of special classes that make data generation more convenient. It’s important to note that these classes are backed up by yml files with the correct data. Faker methods and expressions use these files directly or indirectly to generate data. In the next sections, we become more familiar with the work of these methods and directives.
Datafaker最重要的部分之一是供应商。这是一组特殊的类,使数据生成更加方便。需要注意的是,这些类是由yml文件支持的,具有正确的数据。Faker方法和表达式直接或间接使用这些文件来生成数据。在接下来的章节中,我们会更加熟悉这些方法和指令的工作。
4. Additional Patterns for Data Generation
4.数据生成的其他模式
Datafaker, as well as Javafaker, support generating values based on the provided pattern. Datafaker introduced additional functionality with templatify, exemplify, options, date, csv, and json directives.
Datafaker以及Javafaker都支持根据提供的模式生成数值。Datafaker通过templatify、exemplify、options、date、csv和json directives引入了额外的功能。
4.1. Templatify
4.1.Templatify
The templatify directive takes several arguments. The first one is the base String. The second is the character that will be replaced in the given string. The rest are the options for replacement which will be picked randomly:
templatify指令需要几个参数。第一个是基础String。第二个是将在给定字符串中被替换的字符。其余的是用于替换的选项,将被随机挑选。
public class Templatify {
private static Faker faker = new Faker();
public static void main(String[] args) {
System.out.println("Expression: " + getExpression());
System.out.println("Expression with a placeholder: " + getExpressionWithPlaceholder());
}
static String getExpression() {
return faker.expression("#{templatify 'test','t','j','r'}");
}
static String getExpressionWithPlaceholder() {
return faker.expression("#{templatify '#ight', '#', 'f', 'l', 'm', 'n'}");
}
}
Although we can use the base string without placeholders, it might produce undesirable results as it will replace all the occurrences in the given string. We can introduce a placeholder, a character that appears only in the specific places of the base string. In the case above, the result was:
尽管我们可以使用没有占位符的基础字符串,但它可能会产生不理想的结果,因为它将取代给定字符串中的所有出现。我们可以引入一个占位符,一个只出现在基础字符串特定位置的字符。在上面的案例中,结果是。
Expression: resj
Expression with a placeholder: night
If there’re multiple places where random characters can be placed, it will be randomized each time. Using Strings for replacement is possible, but the documentation doesn’t mention this explicitly. Therefore it’s better to use it with caution.
如果有多个地方可以放置随机字符,那么每次都会被随机化。使用字符串进行替换是可能的,但文档中没有明确提到这一点。因此,最好谨慎使用它。
4.2. Examplify
4.2.举例说明
This directive generates a random value based on the provided example. It will replace lowercase or uppercase characters with the respected value. The same goes for the numbers. Special characters are untouched, which helps to create formatted strings:
该指令根据所提供的例子生成一个随机值。它将用尊重的值替换小写或大写的字符。数字的情况也是如此。特殊字符不被触及,这有助于创建格式化的字符串:。
public class Examplify {
private static Faker faker = new Faker();
public static void main(String[] args) {
System.out.println("Expression: " + getExpression());
System.out.println("Number expression: " + getNumberExpression());
}
static String getExpression() {
return faker.expression("#{examplify 'Cat in the Hat'}");
}
static String getNumberExpression() {
return faker.expression("#{examplify '123-123-123'}");
}
}
An example of the output:
一个输出的例子。
Expression: Lvo lw ero Qkd
Number expression: 707-657-434
4.3. Regexify
雷格赛
This is a more flexible way of creating formatted String values. We can use the regexify directive as an expression or call the regexify method directly on the Faker object:
这是一种更灵活的创建格式化String值的方法。我们可以使用regexify指令作为表达式,或者直接在Faker对象上调用regexify方法:。
public class Regexify {
private static Faker faker = new Faker();
public static void main(String[] args) {
System.out.println("Expression: " + getExpression());
System.out.println("Regexify with a method: " + getMethodExpression());
}
static String getExpression() {
return faker.expression("#{regexify '(hello|bye|hey)'}");
}
static String getMethodExpression() {
return faker.regexify("[A-D]{4,10}");
}
}
Possible output:
可能的输出。
Expression: bye
Regexify with a method: DCCC
4.4. Options
4.4.选项
The options.option directive allows picking an option from a provided list randomly. This functionality can be achieved via regexify, but as it’s a usual case, a separate directive makes sense:
options.option指令允许从提供的列表中随机挑选一个选项。这个功能可以通过regexify实现,但由于这是一个通常的情况,单独的指令是合理的。
public class Option {
private static Faker faker = new Faker();
public static void main(String[] args) {
System.out.println("First expression: " + getFirstExpression());
System.out.println("Second expression: " + getSecondExpression());
System.out.println("Third expression: " + getThirdExpression());
}
static String getFirstExpression() {
return faker.expression("#{options.option 'Hi','Hello','Hey'}");
}
static String getSecondExpression() {
return faker.expression("#{options.option '1','2','3','4','*'}");
}
static String getThirdExpression() {
return faker.expression("#{regexify '(Hi|Hello|Hey)'}");
}
}
The output of the code above:
上述代码的输出。
First expression: Hey
Second expression: 4
Third expression: Hello
If the number of options is too big, creating a custom provider for randomized values makes sense.
如果选项的数量太大,为随机值创建一个自定义提供者是有意义的。
4.5. CSV
4.5.CSV
This directive, based on its name, creates CSV formatted data. However, there might be confusion with using this directive. Because, under the hood, two overloaded methods with quite different signatures handle this directive:
这个指令,根据其名称,可以创建CSV格式的数据。然而,在使用该指令时可能会出现混乱。因为,在引擎盖下,有两个签名完全不同的重载方法处理这个指令:
public class Csv {
private static Faker faker = new Faker();
public static void main(String[] args) {
System.out.println("First expression:\n" + getFirstExpression());
System.out.println("Second expression:\n" + getSecondExpression());
}
static String getFirstExpression() {
String firstExpressionString
= "#{csv '4','name_column','#{Name.first_name}','last_name_column','#{Name.last_name}'}";
return faker.expression(firstExpressionString);
}
static String getSecondExpression() {
String secondExpressionString
= "#{csv ',','\"','true','4','name_column','#{Name.first_name}','last_name_column','#{Name.last_name}'}";
return faker.expression(secondExpressionString);
}
}
The directives above are using expressions #{Name.first_name} and #{Name.last_name}. The next sections will explain the usage of these expressions.
上面的指令使用了表达式#{Name.first_name}和#{Name.last_name}。接下来的章节将解释这些表达式的用法。
The values after the csv directive in the expression are mapped to the parameters of the mentioned methods. The documentation for these methods provides additional information. However, sometimes it’s possible to get problems with parsing these directives, and, in this case, it’s better to use the methods directly. The code above will produce the following output:
表达式中csv指令后面的值被映射到上述方法的参数。这些方法的文档提供了额外的信息。然而,有时在解析这些指令时可能会出现问题,在这种情况下,最好直接使用这些方法。上面的代码将产生以下输出。
First expression:
"name_column","last_name_column"
"Riley","Spinka"
"Lindsay","O'Conner"
"Sid","Rogahn"
"Prince","Wiegand"
Second expression:
"name_column","last_name_column"
"Jen","Schinner"
"Valeria","Walter"
"Mikki","Effertz"
"Deon","Bergnaum"
This is a great way to generate mock data for use outside the application programmatically.
这是以编程方式生成模拟数据供应用程序外使用的一个好方法。
4.6. JSON
4.6 JSON
Another popular and often-used format is JSON. Datafaker allows generating data in JSON format using expressions:
另一种流行和经常使用的格式是JSON。Datafaker允许使用表达式生成JSON格式的数据。
public class Json {
private static final Faker faker = new Faker();
public static void main(String[] args) {
System.out.println(getExpression());
}
static String getExpression() {
return faker.expression(
"#{json 'person'," + "'#{json ''first_name'',''#{Name.first_name}'',''last_name'',''#{Name.last_name}''}'," +
"'address'," + "'#{json ''country'',''#{Address.country}'',''city'',''#{Address.city}''}'}");
}
}
The code above produces the following output:
上面的代码产生以下输出。
{"person": {"first_name": "Dorian", "last_name": "Simonis"}, "address": {"country": "Cameroon", "city": "South Ernestine"}}
4.7. Method Invocations
4.7.方法调用
In fact, all the expressions are just method invocations with the method name and parameters passed as a String. Thus, all the directives above mirror the methods with the same names. However, sometimes it’s more convenient to use plain text to create mock data:
事实上,所有的表达式都只是方法的调用,方法名称和参数以String的形式传递。因此,上面的所有指令都反映了具有相同名称的方法。然而,有时使用纯文本来创建模拟数据会更方便。
public class MethodInvocation {
private static Faker faker = new Faker();
public static void main(String[] args) {
System.out.println("Name from a method: " + getNameFromMethod());
System.out.println("Name from an expression: " + getNameFromExpression());
}
static String getNameFromMethod() {
return faker.name().firstName();
}
static String getNameFromExpression() {
return faker.expression("#{Name.first_Name}");
}
}
Now it’s obvious that the expressions with csv and json directives used method invocations inside. This way, we can call any method for data generation on the Faker object. Although the method names are case insensitive and allow variations in the format, it’s better to refer to the documentation of the used version to verify it.
现在很明显,带有csv和json指令的表达式内部使用了方法调用。这样一来,我们可以在Faker对象上调用任何方法来生成数据。虽然方法名称不区分大小写,并且允许格式上的变化,但最好还是参考所使用版本的文档来验证。
Additionally, it’s possible to pass parameters to a method with an expression. We partially saw this in the formats of the regexify and templatify directives. Even though it might be a bit cumbersome and error-prone in some cases, sometimes this is the most convenient way to interact with Faker:
此外,还可以用表达式向方法传递参数。我们在regexify 和templatify 指令的格式中部分看到了这一点。尽管在某些情况下可能有点麻烦,而且容易出错,但有时这也是与Faker互动的最方便的方式:。
public class MethodInvocationWithParams {
public static int MIN = 1;
public static int MAX = 10;
public static String UNIT = "SECONDS";
private static Faker faker = new Faker();
public static void main(String[] args) {
System.out.println("Duration from the method :" + getDurationFromMethod());
System.out.println("Duration from the expression: " + getDurationFromExpression());
}
static Duration getDurationFromMethod() {
return faker.date().duration(MIN, MAX, UNIT);
}
static String getDurationFromExpression() {
return faker.expression("#{date.duration '1', '10', 'SECONDS'}");
}
}
One of the shortcomings of the expressions is that they return a String object. As a result, this reduces the number of operations we can make on the returned object. The code above produces this output:
表达式的一个缺点是它们返回一个String对象。因此,这减少了我们对返回对象的操作数量。上面的代码产生了这样的输出。
Duration from the method: PT6S
Duration from the expression: PT4S
5. Collections
5.收藏
Collections allow the creation of lists with mocked data. In this case, the elements can be of different types. The collection is parametrized by the most specific type: a parent of all the classes in the collection. Let’s geek out a bit and generate a list of the characters from “Star Wars” and “Start Trek”:
集合允许创建带有模拟数据的列表。在这种情况下,元素可以是不同的类型。集合是由最具体的类型来限定的:集合中所有类的父类。让我们极客一下,生成一个《星球大战》和《星际迷航》中人物的列表。
public class Collection {
public static int MIN = 1;
public static int MAX = 100;
private static Faker faker = new Faker();
public static void main(String[] args) {
System.out.println(getFictionalCharacters());
}
static List<String> getFictionalCharacters() {
return faker.collection(
() -> faker.starWars().character(),
() -> faker.starTrek().character())
.len(MIN, MAX)
.generate();
}
}
As a result, we got the following list:
结果,我们得到了以下名单。
[Luke Skywalker, Wesley Crusher, Jean-Luc Picard, Greedo, Hikaru Sulu, William T. Riker]
[Luke Skywalker, Wesley Crusher, Jean-Luc Picard, Greedo, Hikaru Sulu, William T. Riker]
Because both suppliers in our collection return the String type values, the resulting list will be parametrized by String. Let’s check the situation where we mix different types of data:
由于我们集合中的两个供应商都返回字符串类型的值,因此产生的列表将被String. 让我们检查一下我们混合不同类型数据的情况。
public class MixedCollection {
public static int MIN = 1;
public static int MAX = 20;
private static Faker faker = new Faker();
public static void main(String[] args) {
System.out.println(getMixedCollection());
}
static List<? extends Serializable> getMixedCollection() {
return faker.collection(
() -> faker.date().birthday(),
() -> faker.name().fullName())
.len(MIN, MAX)
.generate();
}
}
In this case, the most specific class for String and Timestamp is Serializable. The output will be the following:
在这种情况下,对于String和Timestamp来说,最具体的类是Serializable.输出结果将是如下。
[1964-11-09 15:16:43.0, Devora Stamm DVM, 1980-01-11 15:18:00.0, 1989-04-28 05:13:54.0,
2004-09-06 17:11:49.0, Irving Turcotte, Sherita Durgan I, 2004-03-08 00:45:57.0, 1979-08-25 22:48:50.0,
Manda Hane, Latanya Hegmann, 1991-05-29 12:07:23.0, 1989-06-26 12:40:44.0, Kevin Quigley]
6. Conclusion
6.结语
Datafaker is a new, improved version of Javafaker. This article covered new functionality introduced in Datafaker 1.6.0, which provided new ways of generating data. However, there is more to learn about this library, and it’s better to refer to the official documentation and GitHub repository to get more information about the functionality and features of Datafaker.
Datafaker是Javafaker的一个新的、改进的版本。本文介绍了Datafaker1.6.0中引入的新功能,它提供了生成数据的新方法。然而,关于这个库还有更多的知识,最好参考官方文档和GitHub资源库,以获得关于Datafaker的功能和特性的更多信息。
As always, the code presented in the article is available over on GitHub.
一如既往,文章中介绍的代码可以在GitHub上找到。