How to add proxy support to Jsoup? – 如何为Jsoup添加代理支持?

最后修改: 2020年 5月 4日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

In this tutorial, we’ll take a look at how to add proxy support to Jsoup.

在本教程中,我们将看看如何向Jsoup添加代理支持。

2. Common Reasons To Use a Proxy

2.使用代理的常见原因

There are two main reasons we might want to use a proxy with Jsoup.

我们可能希望使用Jsoup的代理,主要有两个原因。

2.1. Usage Behind an Organization Proxy

2.1.组织代理背后的使用情况

It’s common for organizations to have proxies controlling Internet access.  If we try to access Jsoup through a proxied local network, we’ll get an exception:

对于组织来说,有代理服务器控制互联网访问是很常见的。 如果我们试图通过代理的本地网络访问Jsoup,我们会得到一个异常

java.net.SocketTimeoutException: connect timed out

When we see this error, we need to set a proxy for Jsoup before trying to access any URL outside of the network.

当我们看到这个错误时,我们需要在尝试访问网络以外的任何URL之前,为Jsoup设置一个代理

2.2. Preventing IP Blocking

2.2.防止IP封锁

Another common reason to use a proxy with Jsoup is to prevent websites from blocking IP addresses.

使用Jsoup代理的另一个常见原因是防止网站屏蔽IP地址。

In other words, using a proxy (or multiple rotating proxies) allows us to parse HTML more reliably, reducing the chance that our code stops working due to a block or ban of our IP address.

换句话说,使用代理(或多个轮流代理)可以让我们更可靠地解析HTML,减少我们的代码因我们的IP地址被封锁或禁止而停止工作的机会

3. Setup

3.设置

When using Maven, we need to add the Jsoup dependency to our pom.xml:

使用Maven时,我们需要将Jsoup依赖项添加到我们的pom.xml

<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.13.1</version>
</dependency>

In Gradle, we have to declare our dependency in build.gradle:

在Gradle中,我们必须在build.gradle中声明我们的依赖性。

compile 'org.jsoup:jsoup:1.13.1'

4. Adding Proxy Support Through Host and Port Properties

4.通过主机和端口属性添加代理支持

Adding proxy support to Jsoup is pretty simple. All we need to do is to call the proxy(String, int) method when building the Connection object:

在Jsoup中添加代理支持是非常简单的。我们需要做的就是在构建Connection对象时调用proxy(String, int)方法

Jsoup.connect("https://spring.io/blog")
  .proxy("127.0.0.1", 1080)
  .get();

Here we set the HTTP proxy to use for this request, with the first argument representing the proxy hostname and the second the proxy port.

在这里,我们为这个请求设置HTTP代理,第一个参数代表代理主机名,第二个参数代表代理端口。

5. Adding Proxy Support Through Proxy Object

5.通过Proxy对象添加代理支持

Or, to add the proxy to Jsoup using the Proxy class, we call the proxy(java.net.Proxy) method of the Connection object:

或者,使用Proxy类向Jsoup添加代理,我们调用proxy(java.net.Proxy)方法的Connection对象:

Proxy proxy = new Proxy(Proxy.Type.HTTP, 
  new InetSocketAddress("127.0.0.1", 1080));

Jsoup.connect("https://spring.io/blog")
  .proxy(proxy)
  .get();

This method takes a Proxy object consisting of a proxy type, typically the HTTP type, and an InetSocketAddress – a class that wraps the proxy’s hostname and port, respectively.

该方法接收一个Proxy对象,该对象由一个代理类型(通常是HTTP类型)和一个InetSocketAddress–一个分别包装了代理的主机名和端口的类。

6. Conclusion

6.结语

In this tutorial, we’ve explored two different ways of adding proxy support to Jsoup.

在本教程中,我们探讨了在Jsoup中添加代理支持的两种不同方式

First, we learned how to do it with the Jsoup method that takes the host and port properties. Second, we learned how to achieve the same result using a Proxy object as a parameter.

首先,我们学习了如何使用Jsoup方法来实现,该方法需要主机和端口属性。其次,我们学习了如何使用一个Proxy对象作为参数来实现同样的结果。

As always, the code samples are available over on GitHub.

一如既往,代码样本在可获得在GitHub上