Java regex word boundary – match specific word or contain word

In this Java regex word boundary example, we will learn to match a specific word in a string. e.g. We will match “java” in “java is object oriented language”. But it should not match “javap” in “javap is another tool in JDL bundle”.

1. java regex word boundary matchers

Boundary matchers help to find a particular word, but only if it appears at the beginning or end of a line. They do not match any characters. Instead, they match at certain positions, effectively anchoring the regular expression match at those positions.

The following table lists and explains all the boundary matchers.

Boundary tokenDescription
^The beginning of a line
$The end of a line
\bA word boundary
\BA non-word boundary
\AThe beginning of the input
\GThe end of the previous match
\ZThe end of the input but for the final terminator, if any
\zThe end of the input

2. Java regex to match specific word

Solution Regex : \bword\b

The regular expression token "\b" is called a word boundary. It matches at the start or the end of a word. By itself, it results in a zero-length match.

Strictly speaking, “\b” matches in these three positions:

  • Before the first character in the data, if the first character is a word character
  • After the last character in the data, if the last character is a word character
  • Between two characters in the data, where one is a word character and the other is not a word character

To run a “spcific word only” search using a regular expression, simply place the word between two word boundaries.

String data1 = "Today, java is object oriented language";
      
String regex = "\\bjava\\b";

Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(data1);
while (matcher.find())
{
	System.out.print("Start index: " + matcher.start());
	System.out.print(" End index: " + matcher.end() + " ");
	System.out.println(matcher.group());
}

Output:

Start index: 7 End index: 11 java

Please note that matching above regex with “Also, javap is another tool in JDL bundle” doesn’t produce any result i.e. doesn’t match any place.

3. Java regex to match word with nonboundaries – contain word example

Suppose, you want to match “java” such that it should be able to match words like “javap” or “myjava” or “myjavaprogram” i.e. java word can lie anywhere in the data string. It could be start of word with additional characters in end, or could be in end of word with additional characters in start as well as in between a long word.

"\B" matches at every position in the subject text where "\B" does not match. "\B" matches at every position that is not at the start or end of a word.

To match such words, use below regex :

Solution Regex : \\Bword|word\\B

String data1 = "Searching in words : javap myjava myjavaprogram";
      
String regex = "\\Bjava|java\\B";

Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(data1);
while (matcher.find())
{
	System.out.print("Start index: " + matcher.start());
	System.out.print(" End index: " + matcher.end() + " ");
	System.out.println(matcher.group());
}

Output:

Start index: 21 End index: 25 java
Start index: 29 End index: 33 java
Start index: 36 End index: 40 java

Please note that it will not match “java” word in first example i.e. “Today, java is object oriented language” because “\\B” does not match start and end of a word.

3. Java regex to match word irrespective of boundaries

This is simplest usecase. You want to match “java” word in all four places in string “Searching in words : java javap myjava myjavaprogram”. To able to do so, simply don’t use anything.

Solution regex : word

String data1 = "Searching in words : java javap myjava myjavaprogram";
      
String regex = "java";

Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(data1);
while (matcher.find())
{
	System.out.print("Start index: " + matcher.start());
	System.out.print(" End index: " + matcher.end() + " ");
	System.out.println(matcher.group());
}

Output:

Start index: 21 End index: 25 java
Start index: 26 End index: 30 java
Start index: 34 End index: 38 java
Start index: 41 End index: 45 java

That’s all for this java regex contain word example related to boundary and non-boundary matches of a specific word using java regular expressions.

Happy Learning !!

References:

Java regex docs

Was this post helpful?

Join 7000+ Fellow Programmers

Subscribe to get new post notifications, industry updates, best practices, and much more. Directly into your inbox, for free.

1 thought on “Java regex word boundary – match specific word or contain word”

Leave a Comment

HowToDoInJava

A blog about Java and its related technologies, the best practices, algorithms, interview questions, scripting languages, and Python.