Java Regex Tutorial

A regex is used as a search pattern for strings. Using regex, we can find either a single match or multiple matches as well. We can look for any king of match in a string e.g. a simple character, a fixed string or any complex pattern of characters such email, SSN or domain names.

1. Regular expressions

Regular expressions are the key to powerful, flexible, and efficient text processing. It allow you to describe and parse text. Regular expressions can add, remove, isolate, and generally fold, spindle, and mutilate all kinds of text and data.

1.1. Metacharacters and literals

Full regular expressions are composed of two types of characters.

  • The special characters (like the * from the filename analogy) are called metacharacters.
  • The rest are called literal, or normal text characters.

Regex gain usefulness from advanced expressive powers that their metacharacters provide. We can think of literal text acting as the words and metacharacters as the grammar. The words are combined with grammar according to a set of rules to create an expression that communicates an idea.

1.2. Java Regex Example

Let’s see a quick Java example to use regex for reference.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main 
{
 public static void main(String[] args) 
 {
  Pattern pattern = Pattern.compile("Alex|Brian");
  Matcher matcher = pattern.matcher("Generally, Alex and Brian share a great bonding.");
  
  while (matcher.find()) {
            System.out.print("Start index: " + matcher.start());
            System.out.print(" End index: " + matcher.end() + " ");
            System.out.println(" - " + matcher.group());
        }
 }
}

Program output.

Start index: 11 End index: 15  - Alex
Start index: 20 End index: 25  - Brian

2. Regex Metacharacters

Let’s explore the commonly used metacharacters to understand them better.

2.1. Start and End of the Line

The start and end are represented with '^' (caret) and '$' (dollar) signs. The caret and dollar are special in that they match a position in the line rather than any actual text characters themselves.

For example, the regular expression “cat” finds ‘cat’ anywhere in the string, but “^cat” matches only if the ‘cat’ is at the beginning of the line. e.g. words like ‘category’ or ‘catalogue’.

Similarly, “cat$” matches only if the ‘cat’ is at the end of the line. e.g. words like ‘scat’.

2.2. Character Classes

The regular-expression construct "[···]", usually called a character class, lets us list the characters we want to allow at that point in the match. Character classes are useful in creating spell-checkers.

For example, while “e” matches just an e, and “a” matches just an a, the regular expression [ea] matches either. e.g. sep[ea]r[ea]te will match all the words “seperate” “separate” and “separete”.

Another example can be to allow capitalization of a word’s first letter e.g. [Ss]mith will allow the words smith and Smith both.

Similarly, <[hH][123456]> will match all heading tags i.e. H1, H2, H3, H4, H5 and H6.

2.2.1. Range of characters

A dash " - " indicates a range of characters. <[hH][1-6]> is similar to <[hH][123456]>. Other useful character ranges are [0-9] and [a-z] which match digits and English lowercase letters.

We can specify multiple ranges in single construct e.g. [0123456789abcdefABCDEF] can be written as [0-9a-fA-F]. Note that order in which ranges are given doesn’t matter.

Note that a dash is a metacharacter only within a character class, otherwise it matches the normal dash character. Also, if it is the first character listed in the range, it can’t possibly indicate a range, so it will not be meta character in this case.

2.2.2. Negated character classes

If we use negation sign ( ^ ) in a character class then the class matches any character that isn’t listed. e.g. [^1-6] matches a character that’s not 1 through 6.

2.3. Matching Any Character with Dot

The metacharacter ' . ' is a shorthand for a character class that matches any character. Note that dots are not metacharacters when they are used within character classes. Within character class, it is a simple character only.

For example, 06.24.2019 will match 06/24/2019 or 06-24-2019 or 06.24.2019. But
06[.]24[.]2019 will match only 06.24.2019.

2.4. Matching Alternation – any one of several sub-expressions

Pipe symbol '|' allows you to combine multiple expressions into a single expression that matches any of the individual ones.

For example, “Alex” and “Brian” are separate expressions, but "Alex|Brian" is one expression that matches either of both.

Similar to dot, pipe is not metacharacter when it is used within character class. Within character class, it is a simple character only.

For example, to match the words “First” or “1st”, we can write regex – “(First|1st)” or in shorthand "(Fir|1)st".

3. Java Regex API

Java has inbuilt APIs (java.util.regex) to work with regular expressions. We do not need any 3rd party library to run regex against any string in Java.

Java Regex API provides 1 interface and 3 classes :

  • Pattern – A regular expression, specified as a string, must first be compiled into an instance of this class. The resulting pattern can then be used to create a Matcher object that can match arbitrary character sequences against the regular expression.
    Pattern p = Pattern.compile("abc");
    Matcher m = p.matcher("abcabcabcd");
    boolean b = m.matches(); //true
    
  • Matcher – This class provides methods that perform match operations.
  • MatchResult (interface) – It is result of a match operation. It contains query methods used to determine the results of a match against a regular expression.
  • PatternSyntaxException – It is an unchecked exception thrown to indicate a syntax error in a regular-expression pattern.

Look at these classes and important methods in more detail.

3.1. Pattern class

It represents the compiled representation of a regular expression. To use Java regex API, we must compile the regular expression to this class.

After compilation, it’s instance can be used to create a Matcher object that can match lines/strings against the regular expression.

Note that many matchers can share the same pattern. State information during processing is kept inside Matcher instance.

Instances of this class are immutable and are safe for use by multiple concurrent threads.

  • Predicate asPredicate() – Creates a Java 8 predicate which can be used to match a string.
  • static Pattern compile(String regex) – It is used to compile the given regular expression into a pattern.
  • static Pattern compile(String regex, int flags) – It is used to compile the given regular expression into a pattern with the given flags.
  • int flags() – It is used to return this pattern’s match flags.
  • Matcher matcher(CharSequence input) – It is used to create a matcher that will match the given input against this pattern.
  • static boolean matches(String regex, CharSequence input) – It is used to compile the given regular expression and attempts to match the given input against it.
  • String pattern() – It is used to return the regular expression from which this pattern was compiled.
  • static String quote(String s) – It is used to return a literal pattern String for the specified String.
  • String[] split(CharSequence input) – It is used to split the given input sequence around matches of this pattern.
  • String[] split(CharSequence input, int limit) – It is used to split the given input sequence around matches of this pattern.
  • Stream splitAsStream(CharSequence input) – Creates a stream from the given input sequence around matches of this pattern.

3.2. Matcher class

It is the main class that performs match operations on a string/line by interpreting a Pattern. Once created, a matcher can be used to perform the different kinds of match operations.

This class also defines methods for replacing matched sub-sequences with new strings whose contents can, if desired, be computed from the match result.

Instances of the this class are not thread safe.

  • boolean find() – It is mainly used for searching multiple occurrences of the regular expressions in the text.
  • boolean find(int start) – It is used for searching occurrences of the regular expressions in the text starting from the given index.
  • int start() – It is used for getting the start index of a match that is being found using find() method.
  • int end() – It is used for getting the end index of a match that is being found using find() method. It returns index of character next to last matching character.
  • int groupCount() – It is used to find the total number of the matched subsequence.
  • String group() – It is used to find the matched subsequence.
  • boolean matches() – It is used to test whether the regular expression matches the pattern.
  • boolean lookingAt() – Attempts to match the input sequence, starting at the beginning of the region, against the pattern.
  • String quoteReplacement(String s) – Returns a literal replacement String for the specified String.
  • Matcher reset() – Resets this matcher.
  • MatchResult toMatchResult() – Returns the match state of this matcher as a MatchResult.

4. Java Regex Examples

Read below given examples to understand the usage of regular expressions to solve these specific problems in applications.

Regular Expression for Email Address

Learn to match email addresses using regular expressions in java

^[a-zA-Z0-9_!#$%&'*+/=?`{|}~^.-]+@[a-zA-Z0-9.-]+$

Regular Expression for Password Validation

Learn to match passwords using regular expressions in java

((?=.*[a-z])(?=.*d)(?=.*[@#$%])(?=.*[A-Z]).{6,16})

Regular Expression for Trademark Symbol

Learn to match trademark symbol using regular expressions in java

\u2122

Regular Expression for Any Currency Symbol

Learn to match currency symbol using regular expressions in java

\\p{Sc}

Regular Expression for Any Character in “Greek Extended” or Greek script

Learn to match character in greek extended and greek script using regular expressions in java

\\p{InGreek} and \\p{InGreekExtended}

Regular Expression for North American Phone Numbers

Learn to match north american phone numbers using regular expressions in java

^\\(?([0-9]{3})\\)?[-.\\s]?([0-9]{3})[-.\\s]?([0-9]{4})$

Regular Expression for International Phone Numbers

Learn to match international phone numbers using regular expressions in java

^\+(?:[0-9] ?){6,14}[0-9]$

Regular Expression for Date Formats

Learn to match date formats using regular expressions in java

^[0-3]?[0-9]/[0-3]?[0-9]/(?:[0-9]{2})?[0-9]{2}$

Regular Expression for Social Security Numbers (SSN)

Learn to match SSNs using regular expressions in java

^(?!000|666)[0-8][0-9]{2}-(?!00)[0-9]{2}-(?!0000)[0-9]{4}$

Regular Expression for International Standard Book Number (ISBNs)

Learn to match ISBNs using regular expressions in java

^(?:ISBN(?:-1[03])?:? )?(?=[0-9X]{10}$|(?=(?:[0-9]+[- ]){3})
[- 0-9X]{13}$|97[89][0-9]{10}$|(?=(?:[0-9]+[- ]){4})[- 0-9]{17}$)
(?:97[89][- ]?)?[0-9]{1,5}[- ]?[0-9]+[- ]?[0-9]+[- ]?[0-9X]$

Regular Expression for US Postal Zip Codes

Learn to match US Postal Codes using regular expressions in java

^[0-9]{5}(?:-[0-9]{4})?$

Regular Expression for Canadian Postal Zip Codes

Learn to match Canadian Postal Codes using regular expressions in java

^(?!.*[DFIOQU])[A-VXY][0-9][A-Z] ?[0-9][A-Z][0-9]$

Regular Expression for U.K. Postal Codes (Postcodes)

Learn to match U.K. Postal Codes using regular expressions in java

^[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][ABD-HJLNP-UW-Z]{2}$

Regular Expression for Credit Card Numbers

Learn to match Credit Card Numbers using regular expressions in java

^(?:(?4[0-9]{12}(?:[0-9]{3})?)|
		(?5[1-5][0-9]{14})|
		(?6(?:011|5[0-9]{2})[0-9]{12})|
		(?3[47][0-9]{13})|
		(?3(?:0[0-5]|[68][0-9])?[0-9]{11})|
		(?(?:2131|1800|35[0-9]{3})[0-9]{11}))$

More Regular Expression Examples

Match Start or End of String (Line Anchors)
Match any character or set of characters

Drop me your questions related to this java regex tutorial in comments.

Happy Learning !!

References:

java.util.regex package

Was this post helpful?

Join 7000+ Fellow Programmers

Subscribe to get new post notifications, industry updates, best practices, and much more. Directly into your inbox, for free.

3 thoughts on “Java Regex Tutorial”

  1. Hi Lokesh,
    I’ve been following your blog for last few months and no doubt, it’s really help me to get confidence in Java; I got a question here few day’s back I had an interview where they had asked me to write a Java function with help of Regular Expression to calculate number of redundant brackets i.e. ‘(‘ in a mathematical formula. Please consider the following test scenarios : ((a+b) * c) count should be 1; (a+b) * c count is 0.

  2. Hi,

    Please include regular expressions for IPv4, IPv6 and MAC Address validation.

    Thanks
    Pramod

Comments are closed.

HowToDoInJava

A blog about Java and its related technologies, the best practices, algorithms, interview questions, scripting languages, and Python.