1. Overview

1.概述

In this tutorial, we’ll build and train a convolutional neural network model using the Deeplearning4j library in Java.

在本教程中，我们将使用Java中的Deeplearning4j库，构建和训练一个卷积神经网络模型。

For further information on how to set up the library please refer to our guide on Deeplearning4j.

关于如何设置库的进一步信息，请参考我们的Deeplearning4j的指南。

2. Image Classification

2.图像分类

2.1. Problem Statement

2.1 问题陈述

Suppose we have a set of images. Each image represents an object of a particular class. Moreover, the object on the image belongs to the only known class. So, the problem statement is to build the model which will be able to recognize the class of the object on the given image.

假设我们有一组图像。每张图像代表一个特定类别的物体。此外，图像上的物体属于唯一已知的类别。因此，问题陈述是建立一个能够识别给定图像上物体类别的模型。

For example, let’s say we have a set of images with ten hand gestures. We build a model and train it to classify them. Then after training, we may pass other images and classify the hand gestures on them. Of course, the given gesture should belong to the known classes.

例如，假设我们有一组有十个手势的图像。我们建立一个模型并训练它对它们进行分类。然后在训练之后，我们可以通过其他图像，对它们的手势进行分类。当然，给定的手势应该属于已知的类别。

2.2. Image Representation

2.2.图像表示法

In computer memory, the image can be represented as a matrix of numbers. Each number is a pixel value, ranging from 0 to 255.

在计算机内存中，图像可以被表示为一个数字的矩阵。每个数字是一个像素值，范围从0到255。

A grayscale image is a 2D matrix. Similarly, the RGB image is a 3D matrix with width, height, and depth dimensions.

一个灰度图像是一个二维矩阵。同样地，RGB图像是一个具有宽度、高度和深度维度的三维矩阵。

As we may see, the image is a set of numbers. Therefore, we can build multi-layer network models to train them to classify images.

正如我们可能看到的，图像是一组数字。因此，我们可以建立多层网络模型，训练它们对图像进行分类。

3. Convolutional Neural Networks

3.卷积神经网络

A Convolutional Neural Network (CNN) is a multi-layer network model that has a specific structure. The structure of a CNN may be divided into two blocks: convolutional layers and fully connected (or dense) layers. Let’s look at each of them.

卷积神经网络（CNN）是一种具有特定结构的多层网络模型。CNN的结构可以分为两块：卷积层和全连接（或密集）层。让我们来看看它们各自的情况。

3.1. Convolutional Layer

3.1.卷积层

Each convolutional layer is a set of square matrices, called kernels. Above all, we need them to perform convolution on the input image. Their amount and size may vary, depending on the given dataset. We mostly use 3×3 or 5×5 kernels, and rarely 7×7 ones. The exact size and amount are selected by trial and error.

每个卷积层是一组方形矩阵，称为内核。最重要的是，我们需要它们来对输入图像进行卷积。它们的数量和大小可能有所不同，取决于给定的数据集。我们大多使用3×3或5×5的内核，很少使用7×7的内核。确切的大小和数量是通过试验和错误选择的。

In addition, we randomly select the variables of kernel matrices at the beginning of the train. They are the weights of the network.

此外，我们在训练开始时随机选择内核矩阵的变量。它们是网络的权重。

To perform convolution, we can use the kernel as the sliding window. We will multiply the kernel weights to the corresponding image pixels and compute the sum. Then we can move the kernel to cover the next chunk of the image using stride (move right) and padding (move down). As a result, we’ll have values that will be used in further computations.

为了进行卷积，我们可以使用内核作为滑动窗口。我们将内核的权重与相应的图像像素相乘，并计算出总和。然后，我们可以使用stride（向右移动）和padding（向下移动）来移动内核以覆盖图像的下一块。结果是，我们会有一些值用于进一步的计算。

In short, with this layer, we get a convolved image. Some variables might be less than zero. This usually means that these variables are less important than the other ones. That is why applying the ReLU function is a good approach to make fewer computations further.

简而言之，有了这个层，我们就得到了一个卷积的图像。有些变量可能小于零。这通常意味着这些变量的重要性低于其他变量。这就是为什么应用ReLU函数是一个很好的方法来进一步减少计算。

3.2. Subsampling Layer

3.2.子采样层

The subsampling (or pooling) layer is a layer of the network, usually used after the convolutional one. After the convolution, we get a lot of computed variables. However, our task is to choose the most valuable among them.

子采样（或池化）层是网络的一个层，通常在卷积层之后使用。在卷积之后，我们得到了很多计算的变量。然而，我们的任务是在其中选择最有价值的。

The approach is to apply a sliding window algorithm to the convolved image. At each step, we’ll choose the maximum value in the square window of a predefined size, usually between 2×2 and 5×5 pixels. As a result, we’ll have fewer computed parameters. Therefore, this will reduce the computations.

该方法是将滑动窗口算法应用于卷积的图像。在每一步，我们将选择预定尺寸的方形窗口中的最大值，通常在2×2和5×5像素之间。因此，我们将有更少的计算参数。因此，这将减少计算量。

3.3. Dense Layer

3.3.密集层

A dense (or fully-connected) layer is one that consists of multiple neurons. We need this layer to perform classification. Moreover, there might be two or more of such consequent layers. Importantly, the last layer should have a size equal to the number of classes for classification.

密集（或全连接）层是由多个神经元组成的。我们需要这个层来进行分类。此外，可能有两个或更多这样的后续层。重要的是，最后一层的大小应该与分类的类的数量相等。

The output of the network is the probability of the image belonging to each of the classes. To predict the probabilities, we’ll use the Softmax activation function.

网络的输出是图像属于每个类别的概率。为了预测概率，我们将使用Softmax>激活函数。

3.4. Optimization Techniques

3.4.优化技术

To perform training, we need to optimize the weights. Remember, we randomly choose these variables initially. The neural network is a big function. And, it has lots of unknown parameters, our weights.

为了进行训练，我们需要优化权重。记住，我们最初随机选择这些变量。神经网络是一个大函数。而且，它有很多未知的参数，即我们的权重。

When we pass an image to the network, it gives us the answer. Then, we may build a loss function, which will depend on this answer. In terms of supervised learning, we also have an actual answer – the true class. Our mission is to minimize this loss function. If we succeed, then our model is well-trained.

当我们把图像传给网络时，它就会给我们答案。然后，我们可以建立一个损失函数，它将取决于这个答案。就监督式学习而言，我们也有一个实际的答案–真正的类别。我们的任务是最小化这个损失函数。如果我们成功了，那么我们的模型就是训练有素的。

To minimize the function, we have to update the weights of the network. In order to do that, we can compute the derivative of the loss function with respect to each of these unknown parameters. Then, we can update each weight.

为了最小化该函数，我们必须更新网络的权重。为了做到这一点，我们可以计算损失函数相对于每个未知参数的导数。然后，我们可以更新每个权重。

We may increase or decrease the weight value to find the local minimum of our loss function because we know the slope. Moreover, this process is iterative and is called Gradient Descent. Backpropagation uses gradient descent to propagate the weight update from the end to the beginning of the network.

我们可以增加或减少权重值来找到我们损失函数的局部最小值，因为我们知道斜率。此外，这个过程是迭代的，被称为梯度下降。逆向传播使用梯度下降将权重更新从网络的末端传播到起点。

In this tutorial, we’ll use the Stochastic Gradient Decent (SGD) optimization algorithm. The main idea is that we randomly choose the batch of train images at each step. Then we apply backpropagation.

在本教程中，我们将使用Stochastic Gradient Decent（SGD）优化算法。其主要思想是，我们在每个步骤中随机选择一批训练图像。然后我们应用反向传播。

3.5. Evaluation Metrics

3.5.评价指标

Finally, after training the network, we need to get information about how well our model performs.

最后，在训练网络之后，我们需要获得关于我们的模型表现如何的信息。

The mostly used metric is accuracy. This is the ratio of correctly classified images to all images. Meanwhile, recall, precision, and F1-score are very important metrics for image classification as well.

最常用的指标是准确性。这是正确分类的图像与所有图像的比率。同时，召回率、精确度和F1分数也是非常重要的图像分类的指标。

4. Dataset Preparation

4.数据集准备

In this section, we’ll prepare the images. Let’s use the embedded CIFAR10 dataset in this tutorial. We’ll create iterators to access the images:

在本节中，我们将准备图像。让我们在本教程中使用嵌入式CIFAR10数据集。我们将创建迭代器来访问这些图像。

public class CifarDatasetService implements IDataSetService {

    private CifarDataSetIterator trainIterator;
    private CifarDataSetIterator testIterator;

    public CifarDatasetService() {
         trainIterator = new CifarDataSetIterator(trainBatch, trainImagesNum, true);
         testIterator = new CifarDataSetIterator(testBatch, testImagesNum, false);
    }

    // other methods and fields declaration

}

We can choose some parameters on our own. TrainBatch and testBatch are the numbers of images per train and evaluation step respectively. TrainImagesNum and testImagesNum are the numbers of images for training and testing. One epoch lasts trainImagesNum / trainBatch steps. So, having 2048 train images with a batch size = 32 will lead to 2048 / 32 = 64 steps per one epoch.

我们可以自己选择一些参数。TrainBatch和testBatch分别是每个训练和评估步骤的图像数量。TrainImagesNum和testImagesNum是训练和测试的图像数量。一个epoch持续trainImagesNum / trainBatch步骤。因此，有2048张训练图像，批次大小=32，将导致2048/32=64步，每一个历时。

5. Convolutional Neural Network in Deeplearning4j

5.Deeplearning4j中的卷积神经网络

5.1. Building the Model

5.1.建立模型

Next, let’s build our CNN model from scratch. To do it, we’ll use convolutional, subsampling (pooling), and fully connected (dense) layers.

接下来，让我们从头开始建立我们的CNN模型。要做到这一点，我们将使用卷积层、子采样（集合）和全连接（密集）层。

MultiLayerConfiguration configuration = new NeuralNetConfiguration.Builder()
  .seed(1611)
  .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
  .learningRate(properties.getLearningRate())
  .regularization(true)
  .updater(properties.getOptimizer())
  .list()
  .layer(0, conv5x5())
  .layer(1, pooling2x2Stride2())
  .layer(2, conv3x3Stride1Padding2())
  .layer(3, pooling2x2Stride1())
  .layer(4, conv3x3Stride1Padding1())
  .layer(5, pooling2x2Stride1())
  .layer(6, dense())
  .pretrain(false)
  .backprop(true)
  .setInputType(dataSetService.inputType())
  .build();

network = new MultiLayerNetwork(configuration);

Here we specify the learning rate, the update algorithm, the input type of our model, and the layered architecture. We can experiment on these configurations. Thus, we can train many models with different architectures and training parameters. Furthermore, we can compare the results and choose the best model.

在这里，我们指定了学习率、更新算法、我们模型的输入类型和分层架构。我们可以对这些配置进行实验。因此，我们可以用不同的架构和训练参数训练许多模型。此外，我们可以比较结果并选择最佳模型。

5.2. Training the Model

5.2.训练模型

Then, we’ll train the built model. This can be done in a few lines of code:

然后，我们将对建立的模型进行训练。这可以在几行代码中完成。

public void train() {
    network.init();    
    IntStream.range(1, epochsNum + 1).forEach(epoch -> {
        network.fit(dataSetService.trainIterator());
    });
}

The number of epochs is the parameter that we can specify ourselves. We have a small dataset. As a result, several hundred epochs will be enough.

epochs的数量是我们可以自己指定的参数。我们有一个小的数据集。因此，几百个历时就足够了。

5.3. Evaluating the Model

5.3.评估该模型

Finally, we can evaluate the now-trained model. Deeplearning4j library provides an ability to do it easily:

最后，我们可以评估现在训练的模型。Deeplearning4j库提供了一种能力，可以轻松做到这一点。

public Evaluation evaluate() {
   return network.evaluate(dataSetService.testIterator());
}

Evaluation is an object, which contains computed metrics after training the model. Those are accuracy, precision, recall, and F1 score. Moreover, it has a friendly printable interface:

Evaluation是一个对象，它包含训练模型后的计算指标。这些指标是准确性、精确性、召回率和F1得分。此外，它有一个友好的可打印界面。

==========================Scores=====================
# of classes: 11
Accuracy: 0,8406
Precision: 0,7303
Recall: 0,6820
F1 Score: 0,6466
=====================================================

6. Conclusion

6.结语

In this tutorial, we’ve learned about the architecture of CNN models, optimization techniques, and evaluation metrics. Furthermore, we’ve implemented the model using the Deeplearning4j library in Java.

在本教程中，我们已经了解了CNN模型的架构、优化技术和评估指标。此外，我们还使用Java中的Deeplearning4j库实现了该模型。

As usual, code for this example is available over on GitHub.

像往常一样，这个例子的代码可以在GitHub上找到。