1. Introduction
1.绪论
Logistic regression is an important instrument in machine learning (ML) practitioner toolbox.
Logistic回归是机器学习(ML)从业人员工具箱中的一个重要工具。
In this tutorial, we’ll explore the main idea behind logistic regression.
在本教程中,我们将探讨逻辑回归的主要思想。
First, let’s start with a brief overview of ML paradigms and algorithms.
首先,让我们先简单介绍一下ML范式和算法。
2. Overview
2.概述
ML allows us to solve problems that we can formulate in human-friendly terms. However, this fact may represent a challenge for us software developers. We’ve accustomed ourselves to address the problems that we can formulate in computer-friendly terms. For example, as human beings, we can easily detect the objects on a photo or establish the mood of a phrase. How we could formulate such a problem for a computer?
ML允许我们解决那些我们可以用人类友好的术语制定的问题。然而,这一事实对我们软件开发者来说可能是一个挑战。我们已经习惯于解决那些我们可以用计算机友好的术语来表述的问题。例如,作为人类,我们可以很容易地检测出照片上的物体或确定一句话的情绪。我们如何为计算机制定这样一个问题呢?
In order to come up with a solution, in ML there is a special stage called training. During this stage, we feed the input data to our algorithm so that it tries to come up with an optimal set of parameters (the so-called weights). The more input data we may feed to the algorithm, the more precise predictions we may expect from it.
为了得出一个解决方案,在ML中有一个特殊的阶段,叫做训练。在这一阶段,我们将输入数据反馈给我们的算法,这样它就会尝试得出一组最佳参数(所谓的权重)。我们向算法提供的输入数据越多,我们就可以期望从它那里得到更精确的预测。
Training is a part of an iterative ML workflow:
训练是迭代式ML工作流程的一部分:。
We start with acquiring data. Often, the data comes from different sources. Therefore, we have to make it be of the same format. We should control as well that the data set fairly represents the domain of study. If the model has never been trained on red apples, it can hardly predict it.
我们从获取数据开始。通常情况下,数据来自不同的来源。因此,我们必须使其具有相同的格式。我们还应该控制该数据集公平地代表研究领域。如果模型从未在红苹果上进行过训练,它就很难预测红苹果。
Next, we should build a model that’ll consume the data and will be able to make predictions. In ML, there are no pre-defined models that work well in all situations.
接下来,我们应该建立一个模型,该模型将消耗数据并能够进行预测。在ML中,没有预先定义的模型在所有情况下都能很好地工作。
When searching for the correct model, it might easily happen that we build a model, train it, see its predictions and discard the model because we’re not happy with the predictions it makes. In this case, we should step back and build another model and repeat the process again.
在寻找正确的模型时,很容易发生这样的情况:我们建立了一个模型,对它进行了训练,看到了它的预测结果,然后抛弃了这个模型,因为我们对它的预测结果不满意。 在这种情况下,我们应该退一步,建立另一个模型,并再次重复这一过程。
3. ML Paradigms
3.ML范式
In ML, based on what kind of input data we have at our disposal, we may single out three main paradigms:
在ML中,根据我们所掌握的输入数据的种类,我们可以总结出三种主要的范式。
- supervised learning (image classification, object recognition, sentiment analysis)
- unsupervised learning (anomaly detection)
- reinforcement learning (game strategies)
The case that we’re going to describe in this tutorial belongs to supervised learning.
我们将在本教程中描述的案例属于监督学习。
4. ML Toolbox
4.ML工具箱
In ML, there is a set of tools that we can apply when building a model. Let’s mention some of them:
在ML中,有一套工具,我们可以在建立模型时应用。让我们提一下其中的一些。
- Linear regression
- Logistic regression
- Neural networks
- Support Vector Machine
- k-Nearest Neighbours
We may combine several tools when building a model that has high predictiveness. In fact, for this tutorial, our model will use logistic regression and neural networks.
在建立一个具有高预测性的模型时,我们可以结合几种工具。事实上,在本教程中,我们的模型将使用逻辑回归和神经网络。
5. ML Libraries
5.ML图书馆
Even though Java is not the most popular language for prototyping ML models, it has a reputation as a reliable tool for creating robust software in many areas including ML. Therefore, we may find ML libraries written in Java.
尽管Java并不是最流行的ML模型原型语言,它在包括ML在内的许多领域都享有创建健壮软件的可靠工具的声誉。因此,我们可以找到用Java编写的ML库。
In this context, we may mention the de-facto standard library Tensorflow which has a Java version as well. Another worth mentioning is a deep learning library called Deeplearning4j. This is a very powerful tool and we’re going to use it in this tutorial, too.
在这方面,我们可以提到事实上的标准库Tensorflow,它也有一个Java版本。另一个值得一提的是一个深度学习库,叫做Deeplearning4j。这是一个非常强大的工具,我们将在本教程中也使用它。
6. Logistic Regression on Digit Recognition
6.数字识别的 Logistic 回归
The main idea of logistic regression is to build a model that predicts the labels of the input data as precisely as possible.
逻辑回归的主要思想是建立一个模型,尽可能准确地预测输入数据的标签。
We train the model until the so-called loss function or objective function reaches some minimal value. The loss function depends on the actual model predictions and expected ones (the labels of the input data). Our goal is to minimize the divergence of actual model predictions and the expected ones.
我们训练模型直到所谓的损失函数或目标函数达到某个最小值。损失函数取决于实际的模型预测和预期的预测(输入数据的标签)。我们的目标是最小化实际模型预测和预期模型预测的分歧。
If we are not happy with that minimum value, we should build another model and perform the training again.
如果我们对这个最小值不满意,我们应该建立另一个模型并再次进行训练。
In order to see logistic regression in action, we illustrate it on the recognition of handwritten digits. This problem has already become a classical one. Deeplearning4j library has a series of realistic examples which show how to use its API. The code-related part of this tutorial is heavily based on MNIST Classifier.
为了看到逻辑回归的作用,我们以手写数字的识别为例进行说明。这个问题已经成为一个经典的问题。Deeplearning4j库有一系列现实的示例,展示了如何使用其API。本教程的代码相关部分主要基于MNIST分类器。
6.1. Input Data
6.1.输入数据
As the input data, we use the well-known MNIST database of handwritten digits. As the input data, we have 28×28 pixel grey-scale images. Each image has a natural label which is the digit that the image represents:
作为输入数据,我们使用著名的MNIST数据库的手写数字。作为输入数据,我们有28×28像素的灰度图像。每张图像都有一个自然标签,即该图像所代表的数字。
In order to estimate the efficiency of the model that we’re going to build, we split the input data into training and test sets:
为了估计我们将要建立的模型的效率,我们将输入数据分成训练集和测试集:。
DataSetIterator train = new RecordReaderDataSetIterator(...);
DataSetIterator test = new RecordReaderDataSetIterator(...);
Once we have the input images labeled and split into the two sets, the “data elaboration” stage is over and we may pass to the “model building”.
一旦我们有了输入图像的标签并分成两组,”数据阐述 “阶段就结束了,我们可以进入 “模型构建 “阶段。
6.2. Model Building
6.2.模型建设
As we’ve mentioned, there are no models that work well in every situation. Nevertheless, after many years of research in ML, scientists have found models that perform very well in recognizing handwritten digits. Here, we use the so-called LeNet-5 model.
正如我们所提到的,没有任何模型在任何情况下都能很好地工作。尽管如此,经过多年的ML研究,科学家已经找到了在识别手写数字方面表现非常好的模型。在这里,我们使用所谓的LeNet-5模型。
LeNet-5 is a neural network that consists of a series of layers that transform the 28×28 pixel image into a ten-dimensional vector:
LeNet-5是一个神经网络,由一系列的层组成,将28×28像素的图像转化为十维矢量。
The ten-dimensional output vector contains probabilities that the input image’s label is either 0, or 1, or 2, and so on.
十维输出向量包含输入图像的标签为0、1或2的概率,以此类推。
For example, if the output vector has the following form:
例如,如果输出矢量有以下形式。
{0.1, 0.0, 0.3, 0.2, 0.1, 0.1, 0.0, 0.1, 0.1, 0.0}
it means that the probability of the input image to be zero is 0.1, to one is 0, to be two is 0.3, etc. We see that the maximal probability (0.3) corresponds to label 3.
这意味着输入图像为0的概率为0.1,为1的概率为0,为2的概率为0.3,等等。我们看到,最大的概率(0.3)对应的是标签3。
Let’s dive into details of model building. We omit Java-specific details and concentrate on ML concepts.
让我们深入了解模型构建的细节。我们省略了Java的具体细节,集中讨论ML的概念。
We set up the model by creating a MultiLayerNetwork object:
我们通过创建一个MultiLayerNetwork对象来设置模型。
MultiLayerNetwork model = new MultiLayerNetwork(config);
In its constructor, we should pass a MultiLayerConfiguration object. This is the very object that describes the geometry of the neural network. In order to define the network geometry, we should define every layer.
在其构造函数中,我们应该传递一个MultiLayerConfiguration对象。这正是描述神经网络几何结构的对象。为了定义网络的几何形状,我们应该定义每个层。
Let’s show how we do this with the first and the second one:
让我们用第一个和第二个来说明我们如何做到这一点。
ConvolutionLayer layer1 = new ConvolutionLayer
.Builder(5, 5).nIn(channels)
.stride(1, 1)
.nOut(20)
.activation(Activation.IDENTITY)
.build();
SubsamplingLayer layer2 = new SubsamplingLayer
.Builder(SubsamplingLayer.PoolingType.MAX)
.kernelSize(2, 2)
.stride(2, 2)
.build();
We see that layers’ definitions contain a considerable amount of ad-hoc parameters which impact significantly on the whole network performance. This is exactly where our ability to find a good model in the landscape of all ones becomes crucial.
我们看到,各层的定义包含了相当多的临时参数,这些参数对整个网络的性能有很大影响。这正是我们在所有模型中找到一个好模型的能力变得至关重要的地方。
Now, we are ready to construct the MultiLayerConfiguration object:
现在,我们准备构建MultiLayerConfiguration对象。
MultiLayerConfiguration config = new NeuralNetConfiguration.Builder()
// preparation steps
.list()
.layer(layer1)
.layer(layer2)
// other layers and final steps
.build();
that we pass to the MultiLayerNetwork constructor.
我们将其传递给多层网络构造函数。
6.3. Training
6.3.培训
The model that we constructed contains 431080 parameters or weights. We’re not going to give here the exact calculation of this number, but we should be aware that just the first layer has more than 24x24x20 = 11520 weights.
我们构建的模型包含431080个参数或权重。我们不打算在这里给出这个数字的精确计算,但是我们应该知道,仅仅是第一层就有超过24x24x20=11520的权重。
The training stage is as simple as:
训练阶段就像这样简单。
model.fit(train);
Initially, the 431080 parameters have some random values, but after the training, they acquire some values that determine the model performance. We may evaluate the model’s predictiveness:
最初,431080个参数有一些随机值,但经过训练后,它们获得了一些决定模型性能的数值。我们可以评估模型的预测能力。
Evaluation eval = model.evaluate(test);
logger.info(eval.stats());
The LeNet-5 model achieves quite a high accuracy of almost 99% even in just a single training iteration (epoch). If we want to achieve higher accuracy, we should make more iterations using a plain for-loop:
LeNet-5模型甚至在一次训练迭代(epoch)中就达到了相当高的准确率,几乎达到99%。如果我们想达到更高的准确性,我们应该使用普通的for-loop进行更多的迭代。
for (int i = 0; i < epochs; i++) {
model.fit(train);
train.reset();
test.reset();
}
6.4. Prediction
6.4.预测
Now, as we trained the model and we are happy with its predictions on the test data, we can try the model on some absolutely new input. To this end, let’s create a new class MnistPrediction in which we’ll load an image from a file that we select from the filesystem:
现在,由于我们训练了模型,并且我们对它在测试数据上的预测感到满意,我们可以在一些绝对新的输入上尝试这个模型。为此,让我们创建一个新的类MnistPrediction,在这个类中,我们将从文件系统中选择一个文件加载图像。
INDArray image = new NativeImageLoader(height, width, channels).asMatrix(file);
new ImagePreProcessingScaler(0, 1).transform(image);
The variable image contains our picture being reduced to 28×28 grayscale one. We can feed it to our model:
变量image包含了我们的图片被还原成28×28的灰度图片。我们可以把它输入我们的模型。
INDArray output = model.output(image);
The variable output will contain the probabilities of the image to be zero, one, two, etc.
变量output将包含图像为零、一、二等的概率。
Let’s now play a little bit and write a digit 2, digitalize this image and feed it the model. We may get something like this:
现在我们来玩一下,写一个数字2,把这个图像数字化,然后把它送入模型。我们可能会得到这样的东西。
As we see, the component with maximal value 0.99 has index two. It means that the model has correctly recognized our handwritten digit.
正如我们所看到的,具有最大值0.99的组件具有索引2。这意味着模型已经正确识别了我们的手写数字。
7. Conclusion
7.结语
In this tutorial, we described the general concepts of machine learning. We illustrated these concepts on logistic regression example which we applied to a handwritten digit recognition.
在本教程中,我们描述了机器学习的一般概念。我们在逻辑回归的例子中说明了这些概念,并将其应用于手写数字的识别。
As always, we may find the corresponding code snippets on our GitHub repository.
一如既往,我们可以在我们的GitHub资源库中找到相应的代码片段。