Get Domain Name From Given URL in Java – 在Java中从给定的URL中获取域名

最后修改: 2022年 5月 23日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

In this short article, we’ll take a look at different ways to get a domain name from a given URL in Java.

在这篇短文中,我们将看看在Java中从一个给定的URL中获取域名的不同方法。

2. What Is a Domain Name?

2.什么是域名?

Simply put, a domain name represents a string that points to an IP address. It is part of the Uniform Resource Locator (URL). Using the domain name, users can access a specific website through the client software.

简单地说,域名代表一个指向IP地址的字符串。它是统一资源定位器(URL)的一部分。使用该域名,用户可以通过客户端软件访问一个特定的网站。

A domain name usually consists of two or three parts, each separated by a dot.

一个域名通常由两到三个部分组成,每个部分由一个点分开。

Starting from the end, the domain name may include:

从最后开始,域名可能包括。

  • top-level domain (e.g., com in bealdung.com),
  • second-level domain (e.g., co in google.co.uk or baeldung in baeldung.com),
  • third-level domain (e.g., google in google.co.uk)

Domain names need to follow the rules and procedures specified by the Domain Name System (DNS).

域名需要遵循域名系统(DNS)规定的规则和程序。

3. Using the URI Class

3.使用URI类

Let’s see how to extract the domain name from a URL using the java.net.URI class. The URI class provides the getHost() method, which returns the host component of the URL:

让我们看看如何使用java.net.URI类从一个URL中提取域名。URI类提供了getHost()方法,该方法返回URL的主机部分。

URI uri = new URI("https://www.baeldung.com/domain");
String host = uri.getHost();
assertEquals("www.baeldung.com", host);

The host contains sub-domain as well as the third, second, and top-level domains. 

主机包含子域名以及第三、第二和顶级域名。

Additionally, to get a domain name, we’d need to remove the sub-domain from the given host:

此外,为了得到一个域名,我们需要从给定的主机上删除子域名。

String domainName = host.startsWith("www.") ? host.substring(4) : host;
assertEquals("baeldung.com", domainName);

However, in some cases, we cannot get the domain name using the URI class. For example, it would be impossible to take out the sub-domain from the URL if we don’t know its exact value.

然而,在某些情况下,我们无法使用URI类获得域名。例如,如果我们不知道它的准确值,就不可能从URL中取出子域名。

4. Using the InternetDomainName Class from Guava Library

4.使用Guava图书馆的InternetDomainName

Now we’ll see how to get the domain name using the Guava library and the InternetDomainName class.

现在我们来看看如何使用Guava库和InternetDomainName类来获得域名。

The InternetDomainName class provides the topPrivateDomain() method, which returns the part of the given domain name that is one level beneath the public suffix. In other words, the method will return top-level, second-level, and third-level domains.

InternetDomainName类提供了topPrivateDomain()方法,该方法返回给定域名中公共后缀下的一级的部分。换句话说,该方法将返回顶级域名、第二级域名和第三级域名。

Firstly, we’d need to extract the host from the given URL value. We can use the URI class:

首先,我们需要从给定的URL值中提取主机。我们可以使用URI类。

String urlString = "https://www.baeldung.com/java-tutorial";
URI uri = new URI(urlString);
String host = uri.getHost();

Next, let’s get a domain name using InternetDomainName class and its topPrivateDomain() method:

接下来,让我们使用InternetDomainName类和它的topPrivateDomain()方法获得一个域名。

InternetDomainName internetDomainName = InternetDomainName.from(host).topPrivateDomain(); 
String domainName = internetDomainName.toString(); 
assertEquals("baeldung.com", domainName);

Compared to the URI class, the InternetDomainName will omit the sub-domain from the returned value.

URI类相比,InternetDomainName将省略返回值中的子域。

Lastly, we can remove the top-level domain from the given URL as well:

最后,我们也可以从给定的URL中删除顶级域名。

String publicSuffix = internetDomainName.publicSuffix().toString();
String name = domainName.substring(0, domainName.lastIndexOf("." + publicSuffix));

In addition, let’s create a test that will check the functionality:

此外,让我们创建一个测试,以检查该功能。

assertEquals("baeldung", domainNameClient.getName("jira.baeldung.com"));
assertEquals("google", domainNameClient.getName("www.google.co.uk"));

We can see that both sub-domains and top-level domains are removed from the result.

我们可以看到,子域和顶级域都被从结果中删除。

5. Using Regular Expression

5.使用正则表达式

Obtaining the domain name using regular expressions can be challenging. For instance, if we don’t know the exact sub-domain value, we cannot determine what word (if any) should be extracted from the given URL.

使用正则表达式获取域名可能是一种挑战。例如,如果我们不知道确切的子域值,我们就不能确定应该从给定的URL中提取什么词(如果有的话)。

On the other hand, if we know the sub-domain value, we can remove it from the URL using a regular expression:

另一方面,如果我们知道子域值,我们可以使用正则表达式从URL中删除它。

String url = "https://www.baeldung.com/domain";
String domainName =  url.replaceAll("http(s)?://|www\\.|/.*", "");
assertEquals("baeldung.com", domainName);

6. Conclusion

6.结语

In this article, we looked at how to extract the domain name from the given URL. As always, the source code for the examples is available over on GitHub.

在这篇文章中,我们研究了如何从给定的URL中提取域名。像往常一样,这些例子的源代码可以在GitHub上找到