1. Introduction
1.介绍
Regular expressions are a powerful tool for matching various kinds of patterns when used appropriately.
如果使用得当,正则表达式是一个强大的工具,可以匹配各种模式。
In this article, we’ll use java.util.regex package to determine whether a given String contains a valid date or not.
在这篇文章中,我们将使用java.util.regex包来确定一个给定的String是否包含一个有效的日期。
For an introduction to regular expressions, refer to our Guide To Java Regular Expressions API.
关于正则表达式的介绍,请参考我们的Java正则表达式API指南。
2. Date Format Overview
2.日期格式概述
We’re going to define a valid date in relation to the international Gregorian calendar. Our format will follow the general pattern: YYYY-MM-DD.
我们将定义一个与国际公历有关的有效日期。我们的格式将遵循一般模式。YYYY-MM-DD.。
Let’s also include the concept of a leap year that is a year containing a day of February 29th. According to the Gregorian calendar, we’ll call a year leap if the year number can be divided evenly by 4 except for those which are divisible by 100 but including those which are divisible by 400.
让我们也包括闰年的概念,即含有2月29日这一天的年份。根据公历,如果年号能被4平均分割,我们就称其为闰年,但那些能被100分割的除外,但包括能被400分割的。
In all other cases, we’ll call a year regular.
在所有其他情况下,我们会把一年称为常规。
Examples of valid dates:
有效日期的例子。
- 2017-12-31
- 2020-02-29
- 2400-02-29
Examples of invalid dates:
无效日期的例子。
- 2017/12/31: incorrect token delimiter
- 2018-1-1: missing leading zeroes
- 2018-04-31: wrong days count for April
- 2100-02-29: this year isn’t leap as the value divides by 100, so February is limited to 28 days
3. Implementing a Solution
3.实施解决方案
Since we’re going to match a date using regular expressions, let’s first sketch out an interface DateMatcher, which provides a single matches method:
由于我们要使用正则表达式来匹配日期,让我们首先勾勒出一个接口DateMatcher,它提供了一个matches方法。
public interface DateMatcher {
boolean matches(String date);
}
We’re going to present the implementation step-by-step below, building towards to complete solution at the end.
我们将在下面逐步介绍实施情况,在最后建立完整的解决方案。
3.1. Matching the Broad Format
3.1.匹配广泛的格式
We’ll start by creating a very simple prototype handling the format constraints of our matcher:
我们将从创建一个非常简单的原型开始,处理我们的匹配器的格式约束。
class FormattedDateMatcher implements DateMatcher {
private static Pattern DATE_PATTERN = Pattern.compile(
"^\\d{4}-\\d{2}-\\d{2}$");
@Override
public boolean matches(String date) {
return DATE_PATTERN.matcher(date).matches();
}
}
Here we’re specifying that a valid date must consist of three groups of integers separated by a dash. The first group is made up of four integers, with the remaining two groups having two integers each.
这里我们指定一个有效的日期必须由三组整数组成,中间用破折号隔开。第一组由四个整数组成,其余两组各有两个整数。
Matching dates: 2017-12-31, 2018-01-31, 0000-00-00, 1029-99-72
匹配日期:2017-12-31, 2018-01-31, 0000-00-00, 1029-99-72。
Non-matching dates: 2018-01, 2018-01-XX, 2020/02/29
不匹配的日期。2018-01, 2018-01-XX, 2020/02/29。
3.2. Matching the Specific Date Format
3.2.匹配特定的日期格式
Our second example accepts ranges of date tokens as well as our formatting constraint. For simplicity, we have restricted our interest to the years 1900 – 2999.
我们的第二个例子接受日期标记的范围,以及我们的格式化约束。为了简单起见,我们将我们的兴趣限制在1900-2999年。
Now that we successfully matched our general date format, we need to constrain that further – to make sure the dates are actually correct:
现在,我们成功地匹配了我们的一般日期格式,我们需要进一步约束–以确保日期实际上是正确的。
^((19|2[0-9])[0-9]{2})-(0[1-9]|1[012])-(0[1-9]|[12][0-9]|3[01])$
Here we’ve introduced three groups of integer ranges that need to match:
这里我们引入了三个需要匹配的整数范围的组。
(19|2[0-9])[0-9]{2}
covers a restricted range of years by matching a number which starts with 19 or 2X followed by a couple of any digits.0[1-9]|1[012]
matches a month number in a range of 01-120[1-9]|[12][0-9]|3[01]
matches a day number in a range of 01-31
Matching dates: 1900-01-01, 2205-02-31, 2999-12-31
匹配日期:1900-01-01, 2205-02-31, 2999-12-31。
Non-matching dates: 1899-12-31, 2018-05-35, 2018-13-05, 3000-01-01, 2018-01-XX
不匹配的日期:1899-12-31, 2018-05-35, 2018-13-05, 3000-01-01, 2018-01-XX。
3.3. Matching the February 29th
3.3.与2月29日相匹配
In order to match leap years correctly we must first identify when we have encountered a leap year, and then make sure that we accept February 29th as a valid date for those years.
为了正确匹配闰年,我们必须首先识别我们何时遇到了闰年,然后确保我们接受2月29日为这些年份的有效日期。
As the number of leap years in our restricted range is large enough we should use the appropriate divisibility rules to filter them:
由于我们限制范围内的闰年数量足够多,我们应该使用适当的可分性规则来过滤它们。
- If the number formed by the last two digits in a number is divisible by 4, the original number is divisible by 4
- If the last two digits of the number are 00, the number is divisible by 100
Here is a solution:
这里有一个解决方案。
^((2000|2400|2800|(19|2[0-9])(0[48]|[2468][048]|[13579][26]))-02-29)$
The pattern consists of the following parts:
该模式由以下部分组成。
2000|2400|2800
matches a set of leap years with a divider of 400 in a restricted range of 1900-299919|2[0-9](0[48]|[2468][048]|[13579][26]))
matches all white-list combinations of years which have a divider of 4 and don’t have a divider of 100-02-29
matches February 2nd
Matching dates: 2020-02-29, 2024-02-29, 2400-02-29
匹配的日期。2020-02-29, 2024-02-29, 2400-02-29
Non-matching dates: 2019-02-29, 2100-02-29, 3200-02-29, 2020/02/29
不匹配的日期。2019-02-29, 2100-02-29, 3200-02-29, 2020/02/29。
3.4. Matching General Days of February
3.4.匹配二月的一般日
As well as matching February 29th in leap years, we also need to match all other days of February (1 – 28) in all years:
除了与闰年的2月29日相匹配外,我们还需要与所有年份的2月的所有其他日子(1-28)相匹配。
^(((19|2[0-9])[0-9]{2})-02-(0[1-9]|1[0-9]|2[0-8]))$
Matching dates: 2018-02-01, 2019-02-13, 2020-02-25
匹配的日期。2018-02-01, 2019-02-13, 2020-02-25
Non-matching dates: 2000-02-30, 2400-02-62, 2018/02/28
不匹配的日期。2000-02-30, 2400-02-62, 2018/02/28
3.5. Matching 31-Day Months
3.5.匹配31天的月份
The months January, March, May, July, August, October, and December should match for between 1 and 31 days:
1月、3月、5月、7月、8月、10月和12月应匹配1至31天。
^(((19|2[0-9])[0-9]{2})-(0[13578]|10|12)-(0[1-9]|[12][0-9]|3[01]))$
Matching dates: 2018-01-31, 2021-07-31, 2022-08-31
匹配的日期。2018-01-31, 2021-07-31, 2022-08-31
Non-matching dates: 2018-01-32, 2019-03-64, 2018/01/31
不匹配的日期。2018-01-32, 2019-03-64, 2018/01/31
3.6. Matching 30-Day Months
3.6.匹配30天的月份
The months April, June, September, and November should match for between 1 and 30 days:
4月、6月、9月和11月应匹配1至30天。
^(((19|2[0-9])[0-9]{2})-(0[469]|11)-(0[1-9]|[12][0-9]|30))$
Matching dates: 2018-04-30, 2019-06-30, 2020-09-30
匹配的日期。2018-04-30, 2019-06-30, 2020-09-30
Non-matching dates: 2018-04-31, 2019-06-31, 2018/04/30
不匹配的日期。2018-04-31, 2019-06-31, 2018/04/30
3.7. Gregorian Date Matcher
3.7公历日期匹配器
Now we can combine all of the patterns above into a single matcher to have a complete GregorianDateMatcher satisfying all of the constraints:
现在我们可以将上面所有的模式合并到一个匹配器中,以得到一个完整的GregorianDateMatcher,满足所有的约束条件。
class GregorianDateMatcher implements DateMatcher {
private static Pattern DATE_PATTERN = Pattern.compile(
"^((2000|2400|2800|(19|2[0-9])(0[48]|[2468][048]|[13579][26]))-02-29)$"
+ "|^(((19|2[0-9])[0-9]{2})-02-(0[1-9]|1[0-9]|2[0-8]))$"
+ "|^(((19|2[0-9])[0-9]{2})-(0[13578]|10|12)-(0[1-9]|[12][0-9]|3[01]))$"
+ "|^(((19|2[0-9])[0-9]{2})-(0[469]|11)-(0[1-9]|[12][0-9]|30))$");
@Override
public boolean matches(String date) {
return DATE_PATTERN.matcher(date).matches();
}
}
We’ve used an alternation character “|” to match at least one of the four branches. Thus, the valid date of February either matches the first branch of February 29th of a leap year either the second branch of any day from 1 to 28. The dates of remaining months match third and fourth branches.
我们使用了一个替代字符”|”来匹配四个分支中的至少一个。因此,2月的有效日期要么与闰年2月29日的第一个分支相匹配,要么与1至28的任何一天的第二个分支相匹配。其余月份的日期则与第三和第四分支相匹配。
Since we haven’t optimized this pattern in favor of a better readability, feel free to experiment with a length of it.
由于我们没有对这个模式进行优化,以利于提高可读性,所以请自由尝试它的长度。
At this moment we have satisfied all the constraints, we introduced in the beginning.
此刻,我们已经满足了所有的约束条件,我们在一开始就介绍了。
3.8. Note on Performance
3.8.关于性能的说明
Parsing complex regular expressions may significantly affect the performance of the execution flow. The primary purpose of this article was not to learn an efficient way of testing a string for its membership in a set of all possible dates.
解析复杂的正则表达式可能会大大影响执行流程的性能。本文的主要目的不是学习一种有效的方法来测试一个字符串是否属于所有可能的日期集合。
Consider using LocalDate.parse() provided by Java8 if a reliable and fast approach to validating a date is needed.
如果需要一个可靠而快速的方法来验证一个日期,可以考虑使用Java8提供的LocalDate.parse()。
4. Conclusion
4.结论
In this article, we’ve learned how to use regular expressions for matching the strictly formatted date of the Gregorian calendar by providing rules of the format, the range and the length of months as well.
在这篇文章中,我们已经学会了如何使用正则表达式来匹配公历的严格格式化的日期,同时提供了格式、范围和月份长度的规则。
All the code presented in this article is available over on Github. This is a Maven-based project, so it should be easy to import and run as it is.
本文介绍的所有代码都可以在Github上找到,。这是一个基于Maven的项目,所以应该很容易导入并按原样运行。