1. Introduction
1.简介
In this article, we’ll create a simple neural network with the deeplearning4j (dl4j) library – a modern and powerful tool for machine learning.
在本文中,我们将使用deeplearning4j(dl4j)库创建一个简单的神经网络–这是一个用于机器学习的现代而强大的工具。
Before we get started, not that this guide doesn’t require a profound knowledge of linear algebra, statistics, machine learning theory and lots of other topics necessary for a well-grounded ML engineer.
在我们开始之前,并不是说本指南不需要对线性代数、统计学、机器学习理论和许多其他基础良好的ML工程师所需的主题有深刻的了解。
2. What Is Deep Learning?
2.什么是深度学习?
Neural networks are computational models that consist of interconnected layers of nodes.
神经网络是由相互连接的节点层组成的计算模型。
Nodes are neuron-like processors of numeric data. They take data from their inputs, apply some weights and functions to these data and send the results to outputs. Such network can be trained with some examples of the source data.
节点是类似神经元的数字数据处理器。它们从输入端获取数据,对这些数据应用一些权重和函数,并将结果发送到输出端。这样的网络可以用一些源数据的例子来训练。
Training essentially is saving some numeric state (weights) in the nodes which later affects the computation. Training examples may contain data items with features and certain known classes of these items (for instance, “this set of 16×16 pixels contains a hand-written letter “a”).
训练本质上是在节点中保存一些数字状态(权重),以后会影响到计算。训练实例可能包含具有特征的数据项目和这些项目的某些已知类别(例如,”这组16×16像素包含一个手写的字母 “a”)。
After training is finished, a neural network can derive information from new data, even if it has not seen these particular data items before. A well-modeled and well-trained network can recognize images, hand-written letters, speech, process statistical data to produce results for business intelligence, and much more.
训练结束后,神经网络可以从新的数据中获取信息,即使它以前没有见过这些特定的数据项。一个经过良好建模和训练的网络可以识别图像、手写信件、语音、处理统计数据以产生商业智能的结果,以及更多。
Deep neural networks became possible in the recent years, with the advance of high-performance and parallel computing. Such networks differ from simple neural networks in that they consist of multiple intermediate (or hidden) layers. This structure allows networks to process data in a lot more complicated manner (in a recursive, recurrent, convolutional way, etc.), and extract a lot more information from it.
近年来,随着高性能和并行计算的发展,深度神经网络成为可能。这些网络与简单的神经网络不同, 它们由多个中间层(或隐藏)层组成。这种结构允许网络以更复杂的方式(以递归、递归、卷积等方式)处理数据,并从中提取更多信息。
3. Setting Up the Project
3.设置项目
To use the library, we need at least Java 7. Also, due to some native components, it only works with the 64-bit JVM version.
要使用该库,我们至少需要Java 7。另外,由于一些本地组件,它只适用于64位JVM版本。
Before starting with the guide, let’s check if requirements are met:
在开始学习指南之前,让我们检查一下是否符合要求。
$ java -version
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
First, let’s add the required libraries to our Maven pom.xml file. We’ll extract the version of the library to a property entry (for the latest version of the libraries, check out the Maven Central repository):
首先,让我们把所需的库添加到Maven的pom.xml文件中。我们将把库的版本提取到一个属性条目中(如需最新版本的库,请查看Maven Central资源库)。
<properties>
<dl4j.version>0.9.1</dl4j.version>
</properties>
<dependencies>
<dependency>
<groupId>org.nd4j</groupId>
<artifactId>nd4j-native-platform</artifactId>
<version>${dl4j.version}</version>
</dependency>
<dependency>
<groupId>org.deeplearning4j</groupId>
<artifactId>deeplearning4j-core</artifactId>
<version>${dl4j.version}</version>
</dependency>
</dependencies>
Note that nd4j-native-platform dependency is one of the several available implementations.
请注意,nd4j-native-platform依赖性是几种可用的实现之一。
It relies on native libraries available for many different platforms (macOS, Windows, Linux, Android, etc.). We could also switch the backend to nd4j-cuda-8.0-platform, if we wanted to execute computations on a graphics card that supports CUDA programming model.
它依赖于可用于许多不同平台(macOS、Windows、Linux、Android等)的本地库。如果我们想在支持CUDA编程模型的显卡上执行计算,我们也可以将后端切换为nd4j-cuda-8.0-平台。
4. Preparing the Data
4.准备数据
4.1. Preparing the DataSet File
4.1.准备数据集文件
We’ll write the “Hello World” of machine learning — classification of the iris flower data set. This is a set of data that was gathered from the flowers of different species (Iris setosa, Iris versicolor, and Iris virginica).
我们将编写机器学习的 “Hello World”–对鸢尾花数据集进行分类。这是一组从不同物种(Iris setosa、Iris versicolor和Iris virginica)的花朵上收集的数据。
These species differ in lengths and widths of petals and sepals. It’d be hard to write a precise algorithm that classifies an input data item (i.e., determines to what species does a particular flower belong). But a well-trained neural network can classify it quickly and with little mistakes.
这些物种在花瓣和萼片的长度和宽度上有所不同。要写一个精确的算法来对输入的数据项进行分类(即确定某朵花属于哪个物种)是很难的。但一个训练有素的神经网络可以快速分类,而且很少出错。
We’re going to use a CSV version of this data, where columns 0..3 contain the different features of the species and column 4 contains the class of the record, or the species, coded with a value 0, 1 or 2:
我们将使用这个数据的CSV版本,其中第0…3列包含物种的不同特征,第4列包含记录的类别,即物种,用0、1或2的数值编码。
5.1,3.5,1.4,0.2,0
4.9,3.0,1.4,0.2,0
4.7,3.2,1.3,0.2,0
…
7.0,3.2,4.7,1.4,1
6.4,3.2,4.5,1.5,1
6.9,3.1,4.9,1.5,1
…
4.2. Vectorizing and Reading the Data
4.2.矢量化和读取数据
We encode the class with a number because neural networks work with numbers. Transforming real-world data items into series of numbers (vectors) is called vectorization – deeplearning4j uses the datavec library to do this.
我们用数字对该类进行编码,因为神经网络是用数字工作的。将现实世界的数据项转换为一系列的数字(向量)被称为向量化–deeplearning4j使用datavec库来完成这一工作。
First, let’s use this library to input the file with the vectorized data. When creating the CSVRecordReader, we can specify the number of lines to skip (for instance, if the file has a header line) and the separator symbol (in our case a comma):
首先,让我们使用这个库来输入带有矢量数据的文件。在创建CSVRecordReader时,我们可以指定要跳过的行数(例如,如果文件有一个标题行)和分隔符(在我们的例子中是逗号)。
try (RecordReader recordReader = new CSVRecordReader(0, ',')) {
recordReader.initialize(new FileSplit(
new ClassPathResource("iris.txt").getFile()));
// …
}
To iterate over the records, we can use any of the multiple implementations of the DataSetIterator interface. The datasets can be quite massive, and the ability to page or cache the values could come in handy.
为了迭代记录,我们可以使用DataSetIterator接口的多个实现中的任何一个。这些数据集可能是相当庞大的,而分页或缓存值的能力可以派上用场。
But our small dataset contains only 150 records, so let’s read all the data into memory at once with a call of iterator.next().
但是我们的小数据集只包含150条记录,所以让我们通过调用iterator.next()将所有数据一次性读入内存。
We also specify the index of the class column which in our case is the same as feature count (4) and the total number of classes (3).
我们还指定了类列的索引,在我们的例子中,它与特征计数(4)和类的总数(3)相同。
Also, note that we need to shuffle the dataset to get rid of the class ordering in the original file.
另外,请注意,我们需要对数据集进行洗牌,以摆脱原始文件中的类别排序。
We specify a constant random seed (42) instead of the default System.currentTimeMillis() call so that the results of the shuffling would always be the same. This allows us to get stable results each time we will run the program:
我们指定了一个恒定的随机种子(42),而不是默认的System.currentTimeMillis()调用,这样洗牌的结果就会一直保持不变。这使我们在每次运行程序时都能得到稳定的结果。
DataSetIterator iterator = new RecordReaderDataSetIterator(
recordReader, 150, FEATURES_COUNT, CLASSES_COUNT);
DataSet allData = iterator.next();
allData.shuffle(42);
4.3. Normalizing and Splitting
4.3.归一化和拆分
Another thing we should do with the data before training is to normalize it. The normalization is a two-phase process:
在训练之前,我们应该对数据做的另一件事是将其规范化。规范化是一个两阶段的过程。
- gathering of some statistics about the data (fit)
- changing (transform) the data in some way to make it uniform
Normalization may differ for different types of data.
对于不同类型的数据,规范化可能有所不同。。
For instance, if we want to process images of various sizes, we should first collect the size statistics and then scale the images to a uniform size.
例如,如果我们想处理各种尺寸的图像,我们应该首先收集尺寸统计,然后将图像缩放到统一的尺寸。
But for numbers, normalization usually means transforming them into a so-called normal distribution. The NormalizerStandardize class can help us with that:
但是对于数字来说,归一化通常意味着将它们转换为所谓的正态分布。NormalizerStandardize类可以帮助我们做到这一点。
DataNormalization normalizer = new NormalizerStandardize();
normalizer.fit(allData);
normalizer.transform(allData);
Now that the data is prepared, we need to split the set into two parts.
现在数据已经准备好了,我们需要将这组数据分成两部分。
The first part will be used in a training session. We’ll use the second part of the data (which the network would not see at all) to test the trained network.
第一部分将被用于训练环节。我们将使用第二部分数据(网络根本不会看到)来测试训练后的网络。
This would allow us to verify that the classification works correctly. We will take 65% of the data (0.65) for the training and leave the rest 35% for the testing:
这将使我们能够验证分类工作是否正确。我们将取65%的数据(0.65)进行训练,其余35%留作测试。
SplitTestAndTrain testAndTrain = allData.splitTestAndTrain(0.65);
DataSet trainingData = testAndTrain.getTrain();
DataSet testData = testAndTrain.getTest();
5. Preparing the Network Configuration
5.准备网络配置
5.1. Fluent Configuration Builder
5.1.Fluent配置生成器
Now we can build a configuration of our network with a fancy fluent builder:
现在我们可以用一个花哨的流畅的构建器来构建我们的网络配置。
MultiLayerConfiguration configuration
= new NeuralNetConfiguration.Builder()
.iterations(1000)
.activation(Activation.TANH)
.weightInit(WeightInit.XAVIER)
.learningRate(0.1)
.regularization(true).l2(0.0001)
.list()
.layer(0, new DenseLayer.Builder().nIn(FEATURES_COUNT).nOut(3).build())
.layer(1, new DenseLayer.Builder().nIn(3).nOut(3).build())
.layer(2, new OutputLayer.Builder(
LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
.activation(Activation.SOFTMAX)
.nIn(3).nOut(CLASSES_COUNT).build())
.backprop(true).pretrain(false)
.build();
Even with this simplified fluent way of building a network model, there’s a lot to digest and a lot of parameters to tweak. Let’s break this model down.
即使采用这种简化的流畅方式来建立网络模型,也有很多东西需要消化,有很多参数需要调整。让我们把这个模型分解一下。
5.2. Setting Network Parameters
5.2.设置网络参数
The iterations() builder method specifies the number of optimization iterations.
iterations()构建者方法指定了优化迭代的数量。
The iterative optimization means performing multiple passes on the training set until the network converges to a good result.
迭代优化意味着对训练集进行多次处理,直到网络收敛到一个好的结果。
Usually, when training on real and large datasets, we use multiple epochs (complete passes of data through the network) and one iteration for each epoch. But since our initial dataset is minimal, we’ll use one epoch and multiple iterations.
通常,在真实的大型数据集上进行训练时,我们会使用多个epoch(通过网络的完整数据),每个epoch有一个迭代。但由于我们的初始数据集是最小的,我们将使用一个epoch和多次迭代。
The activation() is a function that runs inside a node to determine its output.
activation()是一个在节点内部运行的函数,以确定其输出。。
The simplest activation function would be linear f(x) = x. But it turns out that only non-linear functions allow networks to solve complex tasks by using a few nodes.
最简单的激活函数是线性的f(x) = x。但事实证明,只有非线性函数才能使网络通过使用少数节点解决复杂的任务。
There are lots of different activation functions available which we can look up in the org.nd4j.linalg.activations.Activation enum. We could also write our activation function if needed. But we’ll use the provided hyperbolic tangent (tanh) function.
有很多不同的激活函数可用,我们可以在org.nd4j.linalg.activations.Activation枚举中查询。如果需要的话,我们也可以写自己的激活函数。但是我们将使用提供的双曲正切(tanh)函数。
The weightInit() method specifies one of the many ways to set up the initial weights for the network. Correct initial weights can profoundly affect the results of the training. Without going too much into the math, let’s set it to a form of Gaussian distribution (WeightInit.XAVIER), as this is usually a good choice for a start.
weightInit()方法指定了为网络设置初始权重的多种方法之一。正确的初始权重可以深刻地影响训练的结果。在不深入研究数学的情况下,让我们将其设置为高斯分布的一种形式(WeightInit.XAVIER),因为这通常是一个很好的开始选择。
All other weight initialization methods can be looked up in the org.deeplearning4j.nn.weights.WeightInit enum.
所有其他的权重初始化方法都可以在org.deeplearning4j.nn.weights.WeightInit枚举中查询到。
Learning rate is a crucial parameter that profoundly affects the ability of the network to learn.
学习率是一个至关重要的参数,它深刻地影响着网络的学习能力。
We could spend a lot of time tweaking this parameter in a more complex case. But for our simple task, we’ll use a pretty significant value of 0.1 and set it up with the learningRate() builder method.
在更复杂的情况下,我们可以花大量的时间来调整这个参数。但对于我们的简单任务,我们将使用一个相当重要的值0.1,并通过learningRate()构建器方法进行设置。
One of the problems with training neural networks is a case of overfitting when a network “memorizes” the training data.
训练神经网络的一个问题是过拟合的情况,当网络 “记住 “训练数据时。
This happens when the network sets excessively high weights for the training data and produces bad results on any other data.
当网络为训练数据设置了过高的权重,并在任何其他数据上产生不良结果时,就会发生这种情况。
To solve this issue, we’re going to set up l2 regularization with the line .regularization(true).l2(0.0001). Regularization “penalizes” the network for too large weights and prevents overfitting.
为了解决这个问题,我们要用.regularization(true).l2(0.0001)一行来设置l2正则化。正则化会 “惩罚 “网络中过大的权重,并防止过度拟合。
5.3. Building Network Layers
5.3.构建网络层
Next, we create a network of dense (also known as fully connect) layers.
接下来,我们创建一个密集(也称为完全连接)层的网络。
The first layer should contain the same amount of nodes as the columns in the training data (4).
第一层应包含与训练数据中的列相同数量的节点(4)。
The second dense layer will contain three nodes. This is the value we can variate, but the number of outputs in the previous layer has to be the same.
第二个密集层将包含三个节点。这是我们可以变化的值,但前一层的输出数量必须是相同的。
The final output layer should contain the number of nodes matching the number of classes (3). The structure of the network is shown in the picture:
最后的输出层应该包含与类的数量相匹配的节点数量(3)。该网络的结构如图所示。
After successful training, we’ll have a network that receives four values via its inputs and sends a signal to one of its three outputs. This is a simple classifier.
训练成功后,我们会有一个网络,通过其输入接收四个值,并向其三个输出之一发送信号。这就是一个简单的分类器。
Finally, to finish building the network, we set up back propagation (one of the most effective training methods) and disable pre-training with the line .backprop(true).pretrain(false).
最后,为了完成网络的构建,我们设置了反向传播(最有效的训练方法之一),并通过.backprop(true).pretrain(false)一行关闭了预训练。
6. Creating and Training a Network
6.创建和培训一个网络
Now let’s create a neural network from the configuration, initialize and run it:
现在让我们从配置中创建一个神经网络,初始化并运行它。
MultiLayerNetwork model = new MultiLayerNetwork(configuration);
model.init();
model.fit(trainingData);
Now we can test the trained model by using the rest of the dataset and verify the results with evaluation metrics for three classes:
现在,我们可以通过使用其余的数据集来测试训练好的模型,并用三个类别的评价指标来验证结果。
INDArray output = model.output(testData.getFeatureMatrix());
Evaluation eval = new Evaluation(3);
eval.eval(testData.getLabels(), output);
If we now print out the eval.stats(), we’ll see that our network is pretty good at classifying iris flowers, although it did mistake class 1 for class 2 three times.
如果我们现在打印出eval.stats(),我们会看到我们的网络在对鸢尾花进行分类方面相当不错,尽管它有三次把1类误认为2类。
Examples labeled as 0 classified by model as 0: 19 times
Examples labeled as 1 classified by model as 1: 16 times
Examples labeled as 1 classified by model as 2: 3 times
Examples labeled as 2 classified by model as 2: 15 times
==========================Scores========================================
# of classes: 3
Accuracy: 0.9434
Precision: 0.9444
Recall: 0.9474
F1 Score: 0.9411
Precision, recall & F1: macro-averaged (equally weighted avg. of 3 classes)
========================================================================
The fluent configuration builder allows us to add or modify layers of the network quickly, or tweak some other parameters to see if our model can be improved.
Fluent配置生成器允许我们快速添加或修改网络层,或调整一些其他参数,看看我们的模型是否可以得到改善。
7. Conclusion
7.结论
In this article, we’ve built a simple yet powerful neural network by using the deeplearning4j library.
在这篇文章中,我们通过使用deeplearning4j库构建了一个简单而强大的神经网络。
As always, the source code for the article is available over on GitHub.
一如既往,该文章的源代码可在GitHub上获取。