HomeGuides
Log In
Guides

RegEx Rules

Java Regular Expressions, also known as RegEx, are sequences of characters that form a search pattern. A RegEX can be used to perform text searches as it defines a set of strings, usually united for a given purpose, i.e. describe what you are searching for.

In the following paragraphs we have included essential RegEx rules that you might find useful when compiling your RegEx.
For a complete RegEx syntax refer to the class Pattern of Oracle documentation available at this link: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html

Character Classes

[abc] Matches a or b, or c.
[^abc] Negation, matches everything except a or b, or c.
[a-c] Range, matches a or b, or c.
[a-c[f-h]] Union, matches a, b, c, f, g, h.
[a-c&&[b-c]] Intersection, matches b or c.
[a-c&&[^b-c]] Subtraction, matches a.

Predefined Character Classes

. Any character.
\d A digit: [0-9].
\D A non-digit: [^0-9].
\s A whitespace character: [\t\n\xOB\f\r].
\S A non-whitespace character: [^\s].
\w A word character: [a-zA-Z_0-9].
\W A non-word character: [^\w].

Boundary Matches

^ The beginning of a line.
$ The end of a line.
\b A word boundary.
\B A non-word boundary.
\A The beginnng of the input.
\G The end of the previous match.
\Z The end of the input but for the final terminator, if any.
\z The end of the input.

Pattern Flags

Pattern.CASE_INSENSITIVE Enables case-insensitive matching.
Pattern.COMMENTS Whitespace and comments starting with # are ignored until the end of a line.
Pattern.MULTILINE One expression can match multiple lines.
Pattern.UNIX_LINES Only the '\n' line terminator is recognized in the behavior of ., ^, and $.

Useful Java Classes and Methods

PATTERN

A pattern is a compiler representation of a regular expression.

Pattern compile(String regex) Compiles the given regular expression into a pattern.
Pattern compile(String regex, int flags) Compiles the given regular expression into a pattern with the given flags.
boolean matches(String regex) Tells whether or not this string matches the given regular expression.
String[] split(CharSequence input) Splits the given input sequence around matches of this pattern.
String quote(String s) Returns a literal pattern String for the specified String.
Predicate<String> asPredicate() Creates a predicate which can be used to match a string.

MATCHER

An engine that performs match operations on a character sequence by interpreting a pattern.

boolean matches() Attempts to match the entire region against the pattern.
boolean find() Attempts to find the next subsequence of the input sequence that matches the pattern.
int start() Returns the start index of the previous match.
int end() Returns the offset after the last character matched.

Quantifiers

Greedy Matches the longest matching group.
Reluctant Matches the shortest group.
Possessive Longest match or bust (no backoff).
Greedy Reluctant Possessive Description
X? X?? X?+ X, once or not at all.
X* X*? X*+ X, zero or more times.
X+ X+? X++ X, one or more times.
X{n} X{n}? X{n}+ X, exactly n times.
X{n,} X{n,}? X{n,}+ X, at least n times.
X{n,m} X{n,m}? X{n,m}+ X, at least n but not more than m times.

Groups & Backreferences

A group is a captured subsequence of characters which may be used later in the expression with a backreference.

(...) Defines a group.
\N Refers to a matched group.
(\d\d) A group of two digits.
(\d\d)/\1 Two digits repeated twice. \1 Refers to the matched group.

Logical Operations

XY X then Y.
X|Y X or Y.