Email Validation in Java – Java中的电子邮件验证

最后修改: 2021年 9月 28日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

In this tutorial, we’ll learn how to validate email addresses in Java using regular expressions.

在本教程中,我们将学习如何使用regular expressions在Java中验证电子邮件地址。

2. Email Validation in Java

2.Java中的电子邮件验证

Email validation is required in nearly every application that has user registration in place.

几乎所有有用户注册的应用程序都需要进行电子邮件验证。

An email address is divided into three main parts: the local part, an @ symbol, and a domain. For example, if “username@domain.com” is an email, then:

一个电子邮件地址主要分为三个部分:本地部分、@符号和域名。例如,如果 “username@domain.com “是一个电子邮件,那么。

  • local part = username
  • @ = @
  • domain = domain.com

It can take a lot of effort to validate an email address through string manipulation techniques, as we typically need to count and check all the character types and lengths. But in Java, by using a regular expression, it can be much easier.

通过字符串操作技术来验证电子邮件地址可能需要花费很多精力,因为我们通常需要计算和检查所有的字符类型和长度。但在Java中,通过使用正则表达式,可以变得更加简单。

As we know, a regular expression is a sequence of characters to match patterns. In the following sections, we’ll see how email validation can be performed by using several different regular expression methods.

正如我们所知,正则表达式是一个字符序列,用于匹配模式。在下面的章节中,我们将看到如何通过使用几种不同的正则表达式方法进行电子邮件验证。

3. Simple Regular Expression Validation

3.简单的正则表达式验证

The simplest regular expression to validate an email address is ^(.+)@(\S+) $.

验证电子邮件地址的最简单的正则表达式是^(.+)@(\S+) $

It only checks the presence of the @ symbol in the email address. If present, then the validation result returns true, otherwise, the result is false. However, this regular expression doesn’t check the local part and domain of the email.

它只检查电子邮件地址中是否有@符号。如果存在,那么验证结果返回true,否则,结果是false。然而,这个正则表达式并不检查电子邮件的本地部分和域名。

For example, according to this regular expression, username@domain.com will pass the validation, but username#domain.com will fail the validation.

例如,根据这个正则表达式,username@domain.com将通过验证,但username#domain.com将无法验证。

Let’s define a simple helper method to match the regex pattern:

让我们定义一个简单的辅助方法来匹配regex模式。

public static boolean patternMatches(String emailAddress, String regexPattern) {
    return Pattern.compile(regexPattern)
      .matcher(emailAddress)
      .matches();
}

We’ll also write the code to validate the email address using this regular expression:

我们还将编写代码,使用这个正则表达式验证电子邮件地址。

@Test
public void testUsingSimpleRegex() {
    emailAddress = "username@domain.com";
    regexPattern = "^(.+)@(\\S+)$";
    assertTrue(EmailValidation.patternMatches(emailAddress, regexPattern));
}

The absence of the @ symbol in the email address will also fail the validation.

如果电子邮件地址中没有@符号,也将无法通过验证。

4. Strict Regular Expression Validation

4.严格的正则表达式验证

Now let’s write a more strict regular expression that will check the local part, as well as the domain part of the email:

现在让我们写一个更严格的正则表达式,它将检查电子邮件的本地部分,以及域名部分。

^(?=.{1,64}@)[A-Za-z0-9_-]+(\\.[A-Za-z0-9_-]+)*@[^-][A-Za-z0-9-]+(\\.[A-Za-z0-9-]+)*(\\.[A-Za-z]{2,})$

^(?=.{1,64}@)[A-Za-z0-9_-]+(\\.[A-Za-z0-9_-]+)*@[^-][A-Za-z0-9-]+(\\.[A-Za-z0-9-]+)*(\\.[A-Za-z]{2,})$

The following restrictions are imposed in the email address’ local part by using this regex:

在电子邮件地址的本地部分,通过使用这个重码,施加以下限制。

  • It allows numeric values from 0 to 9.
  • Both uppercase and lowercase letters from a to z are allowed.
  • Allowed are underscore “_”, hyphen “-“, and dot “.”
  • Dot isn’t allowed at the start and end of the local part.
  • Consecutive dots aren’t allowed.
  • For the local part, a maximum of 64 characters are allowed.

Restrictions for the domain part in this regular expression include:

该正则表达式中的域部分的限制条件包括:。

  • It allows numeric values from 0 to 9.
  • We allow both uppercase and lowercase letters from a to z.
  • Hyphen “-” and dot “.” aren’t allowed at the start and end of the domain part.
  • No consecutive dots.

We’ll also write the code to test out this regular expression:

我们还将编写代码来测试这个正则表达式。

@Test
public void testUsingStrictRegex() {
    emailAddress = "username@domain.com";
    regexPattern = "^(?=.{1,64}@)[A-Za-z0-9_-]+(\\.[A-Za-z0-9_-]+)*@" 
        + "[^-][A-Za-z0-9-]+(\\.[A-Za-z0-9-]+)*(\\.[A-Za-z]{2,})$";
    assertTrue(EmailValidation.patternMatches(emailAddress, regexPattern));
}

So some of the email addresses that will be valid via this email validation technique are:

因此,通过这种电子邮件验证技术,一些电子邮件地址将是有效的。

  • username@domain.com
  • user.name@domain.com
  • user-name@domain.com
  • username@domain.co.in
  • user_name@domain.com

Here’s a shortlist of some email addresses that will be invalid via this email validation:

下面是一些通过该电子邮件验证将无效的电子邮件地址的短名单。

  • username.@domain.com
  • .user.name@domain.com
  • user-name@domain.com.
  • username@.com

5. Regular Expression for Validation of Non-Latin or Unicode Characters Email

5.用于验证非拉丁语或Unicode字符的正则表达式 电子邮件

The regex that we just saw in the previous section will work well for email addresses written in the English language, but it won’t work for Non-Latin email addresses.

我们在上一节中看到的重码对用英语书写的电子邮件地址很有效,但对非拉丁语系的电子邮件地址则无效。

So we’ll write a regular expression that we can use to validate unicode characters as well:

所以我们要写一个正则表达式,也可以用来验证unicode字符。

^(?=.{1,64}@)[\\p{L}0-9_-]+(\\.[\\p{L}0-9_-]+)*@[^-][\\p{L}0-9-]+(\\.[\\p{L}0-9-]+)*(\\.[\\p{L}]{2,})$

^(?=.{1,64}@)[\\p{L}0-9_-]+(\\.[\\p{L}0-9_-]+)*@[^-][\\p{L}0-9-]+(\\.[\\p{L}0-9-]+)*(\\.[\\p{L}]{2,})$

We can use this regex for validating Unicode or Non-Latin email addresses to support all languages.

我们可以使用这个词组来验证Unicode或非拉丁文的电子邮件地址,以支持所有语言。

As we can see, this regex is similar to the strict regex that we built in the previous section, except that we changed the “A-Za-Z” part with “\\p{L}”. This is to enable the support for Unicode characters.

正如我们所看到的,这个词组与我们在上一节建立的严格词组相似,只是我们把”A-Za-Z“部分改为”\p{L}”。这是为了实现对Unicode字符的支持。

Let’s check this regex by writing the test:

让我们通过编写测试来检查这个词条。

@Test
public void testUsingUnicodeRegex() {
    emailAddress = "用户名@领域.电脑";
    regexPattern = "^(?=.{1,64}@)[\\p{L}0-9_-]+(\\.[\\p{L}0-9_-]+)*@" 
        + "[^-][\\p{L}0-9-]+(\\.[\\p{L}0-9-]+)*(\\.[\\p{L}]{2,})$";
    assertTrue(EmailValidation.patternMatches(emailAddress, regexPattern));
}

This regex not only presents a more strict approach to validate email addresses, but it also supports Non-Latin characters as well.

这个词组不仅提出了一个更严格的验证电子邮件地址的方法,而且它还支持非拉丁字符。

6. Regular Expression by RFC 5322 for Email Validation

6.RFC 5322的正则表达式用于电子邮件验证

Instead of writing a custom regex to validate email addresses, we can use one provided by the RFC standards.

我们可以使用RFC标准提供的一种方法来验证电子邮件地址,而不是编写一个自定义的勒格函数。

The RFC 5322, which is an updated version of RFC 822, provides a regular expression for email validation.

RFC 5322RFC 822的更新版本,提供了一个用于电子邮件验证的正则表达式。

Let’s check it out:

让我们来看看。

^[a-zA-Z0-9_!#$%&’*+/=?`{|}~^.-]+@[a-zA-Z0-9.-]+$

^[a-zA-Z0-9_!#$%&’*+/=?`{|}~^.-]+@[a-zA-Z0-9.-]+$

As we can see, it’s a very simple regex that allows all the characters in the email.

正如我们所看到的,这是一个非常简单的重构函数,允许电子邮件中的所有字符。

However, it doesn’t allow the pipe character (|) and single quote (‘), as these present a potential SQL injection risk when passed from the client site to the server.

然而,它不允许使用管道字符(|)和单引号(’),因为当这些字符从客户站点传递到服务器时,存在潜在的SQL注入风险。

Let’s write the code to validate an email with this regex:

让我们来写代码,用这个词组来验证一个电子邮件。

@Test
public void testUsingRFC5322Regex() {
    emailAddress = "username@domain.com";
    regexPattern = "^[a-zA-Z0-9_!#$%&'*+/=?`{|}~^.-]+@[a-zA-Z0-9.-]+$";
    assertTrue(EmailValidation.patternMatches(emailAddress, regexPattern));
}

7. Regular Expression to Check Characters in the Top-Level Domain

7.用正则表达式检查顶级域中的字符

We’ve written regex to verify the email address’ local and domain parts. Now we’ll also write a regex that checks the top-level domain of the email.

我们已经写了regex来验证电子邮件地址的本地和域名部分。现在我们也要写一个检查电子邮件顶级域名的词条。

The below regular expression validates the top-level domain part of the email address:

下面的正则表达式验证了电子邮件地址的顶级域名部分。

^[\\w!#$%&’*+/=?`{|}~^-]+(?:\\.[\\w!#$%&’*+/=?`{|}~^-]+)*@(?:[a-zA-Z0-9-]+\\.)+[a-zA-Z]{2,6}$

^[\\w!#$%&’*+/=?`{|}~^-]+(?:\\.[\\w!#$%&’*+/=?`{|}~^-]+)*@(?:[a-zA-Z0-9-]+\\.)+[a-zA-Z]{2,6}$

This regex basically checks whether the email address has only one dot, and that there’s a minimum of two and maximum of six characters present in the top-level domain.

这个词组基本上是检查电子邮件地址是否只有一个点,以及顶级域名中是否有最少两个和最多六个字符。

We’ll also write some code to verify the email address by using this regex:

我们还将写一些代码,通过使用这个regex来验证电子邮件地址。

@Test
public void testTopLevelDomain() {
    emailAddress = "username@domain.com";
    regexPattern = "^[\\w!#$%&'*+/=?`{|}~^-]+(?:\\.[\\w!#$%&'*+/=?`{|}~^-]+)*" 
        + "@(?:[a-zA-Z0-9-]+\\.)+[a-zA-Z]{2,6}$";
    assertTrue(EmailValidation.patternMatches(emailAddress, regexPattern));
}

8. Regular Expression to Restrict Consecutive, Trailing, and Leading Dots

8.限制连续点、拖尾点和领先点的正则表达式

Now let’s write a regex that will restrict the usage of dots in the email addresses:

现在,让我们写一个限制电子邮件地址中的点的使用的重码。

^[a-zA-Z0-9_!#$%&’*+/=?`{|}~^-]+(?:\\.[a-zA-Z0-9_!#$%&’*+/=?`{|}~^-]+)*@[a-zA-Z0-9-]+(?:\\.[a-zA-Z0-9-]+)*$

^[a-zA-Z0-9_!#$%&’*+/=?`{|}~^-]+(?:\\.[a-zA-Z0-9_!#$%&’*+/=?`{|}~^-]+)*@[a-zA-Z0-9-]+(?:\\.[a-zA-Z0-9-]+)*$

The above regular expression is used to restrict consecutively, leading, and trailing dots. Thus, an email can contain more than one dot, but not consecutive in the local and domain parts.

上述正则表达式用于限制连续的、前面的和后面的圆点。因此,一封邮件可以包含一个以上的点,但在本地和域的部分不能连续。

Let’s take a look at the code:

让我们看一下代码。

@Test
public void testRestrictDots() {
    emailAddress = "username@domain.com";
    regexPattern = "^[a-zA-Z0-9_!#$%&'*+/=?`{|}~^-]+(?:\\.[a-zA-Z0-9_!#$%&'*+/=?`{|}~^-]+)*@" 
        + "[a-zA-Z0-9-]+(?:\\.[a-zA-Z0-9-]+)*$";
    assertTrue(EmailValidation.patternMatches(emailAddress, regexPattern));
}

9. OWASP Validation Regular Expression

9.OWASP验证正则表达式

This regular expression is provided by the OWASP validation regex repository to check the email validation:

该正则表达式由OWASP验证规则库提供,用于检查电子邮件的验证。

^[a-zA-Z0-9_+&*-] + (?:\\.[a-zA-Z0-9_+&*-] + )*@(?:[a-zA-Z0-9-]+\\.) + [a-zA-Z]{2, 7}

^[a-zA-Z0-9_+&*-] + (?:[a-zA-Z0-9_+&*-] + )*@(?:[a-zA-Z0-]+&*-) + [a-zA-Z] {2, 7}

This regex also supports the most validations in the standard email structure.

这个词组也支持标准电子邮件结构中的大多数验证。

Let’s verify the email address by using the below code:

让我们通过使用下面的代码来验证电子邮件地址。

@Test
public void testOwaspValidation() {
    emailAddress = "username@domain.com";
    regexPattern = "^[a-zA-Z0-9_+&*-]+(?:\\.[a-zA-Z0-9_+&*-]+)*@(?:[a-zA-Z0-9-]+\\.)+[a-zA-Z]{2,7}$";
    assertTrue(EmailValidation.patternMatches(emailAddress, regexPattern));
}

10. Gmail Special Case for Emails

10.Gmail电子邮件的特殊情况

There’s one special case that applies only to the Gmail domain: it’s permission to use the character + character in the local part of the email. For the Gmail domain, the two email addresses username+something@gmail.com and username@gmail.com are the same.

有一种特殊情况只适用于Gmail域名:它允许在电子邮件的本地部分使用+字符。对于Gmail域名,用户名+something@gmail.com 和username@gmail.com 这两个电子邮件地址是一样的。

Also, username@gmail.com is similar to user+name@gmail.com.

另外,username@gmail.com,类似于user+name@gmail.com。

We must implement a slightly different regex that will pass the email validation for this special case as well:

我们必须实现一个稍有不同的重码,以便在这种特殊情况下也能通过电子邮件验证。

^(?=.{1,64}@)[A-Za-z0-9_-+]+(\\.[A-Za-z0-9_-+]+)*@[^-][A-Za-z0-9-+]+(\\.[A-Za-z0-9-+]+)*(\\.[A-Za-z]{2,})$

^(?=.{1,64}@)[A-Za-z0-9_-+]+(\\.[A-Za-z0-9_-+]+)*@[^-][A-Za-z0-9-+]+(\\.[A-Za-z0-9-+]+)*(\\.[A-Za-z]{2,})$

Let’s write an example to test this use case:

让我们写一个例子来测试这个用例。

@Test
public void testGmailSpecialCase() {
    emailAddress = "username+something@domain.com";
    regexPattern = "^(?=.{1,64}@)[A-Za-z0-9\\+_-]+(\\.[A-Za-z0-9\\+_-]+)*@" 
        + "[^-][A-Za-z0-9\\+-]+(\\.[A-Za-z0-9\\+-]+)*(\\.[A-Za-z]{2,})$";
    assertTrue(EmailValidation.patternMatches(emailAddress, regexPattern));
}

11. Apache Commons Validator for Email

11.Apache Commons的电子邮件验证器

The Apache Commons Validator is a validation package that contains standard validation rules. So by importing this package, we can apply email validation.

Apache Commons Validator是一个包含标准验证规则的验证包。因此,通过导入这个包,我们可以应用电子邮件验证。

We can use the EmailValidator class to validate the email, which uses RFC 822 standards. This Validator contains a mixture of custom code and regular expressions to validate an email. It not only supports the special characters, but also supports the Unicode characters we’ve discussed.

我们可以使用EmailValidator类来验证电子邮件,它使用RFC 822标准。这个Validator包含了自定义代码和正则表达式的混合物来验证电子邮件。它不仅支持特殊字符,而且还支持我们讨论过的Unicode字符。

Let’s add the commons-validator dependency in our project:

让我们在我们的项目中添加commons-validator依赖项。

<dependency>
    <groupId>commons-validator</groupId>
    <artifactId>commons-validator</artifactId>
    <version>${validator.version}</version>
</dependency>

Now we can validate email addresses using the below code:

现在我们可以使用下面的代码来验证电子邮件地址。

@Test
public void testUsingEmailValidator() {
    emailAddress = "username@domain.com";
    assertTrue(EmailValidator.getInstance()
      .isValid(emailAddress));
}

12. Which Regex Should I Use?

12.我应该使用哪个Regex?

In this article, we’ve looked at a variety of solutions using regex for email address validation. Obviously, determining which solution we should use depends on how strict we want our validation to be, and our exact requirements.

在这篇文章中,我们已经看了各种使用regex进行电子邮件地址验证的解决方案。显然,确定我们应该使用哪种解决方案取决于我们希望验证的严格程度,以及我们的具体要求。

For example, we can use the simple regex from section 3 if we just need a simple regex to check the presence of an @ symbol in an email. However, for more detailed validation, we can opt for the stricter regex solution from section 6 based on the RFC5322 standard.

例如,如果我们只需要一个简单的词组来检查电子邮件中是否存在@符号,我们可以使用第3节中的简单词组。然而,对于更详细的验证,我们可以选择第6节中基于RFC5322标准的更严格的重码解决方案。

Finally, if we’re dealing with Unicode characters in an email, we can go for the regex solution provided in section 5.

最后,如果我们要处理电子邮件中的Unicode字符,我们可以采用第5节中提供的regex解决方案。

13. Conclusion

13.结语

In this article, we learned various ways to validate email addresses in Java using regular expressions.

在这篇文章中,我们学习了在Java中使用正则表达式验证电子邮件地址的各种方法。

The complete code for this article is available over on GitHub.

本文的完整代码可在GitHub上获得