Sometimes we have a requirement where we have to filter out lines from logs, which start from a certain word OR end with a certain word. In this Java regex word boundary tutorial, we will learn to create a regex to filter out lines that either start or end with a certain word.
1. Boundary Matchers
Boundary matchers are special characters or sequences used in regular expressions (regex) to match specific positions within a string or text. Boundary matchers do not match any actual characters but instead match positions or boundaries between characters, effectively anchoring the regular expression match at those positions.
The following table lists and explains all the boundary matchers.
Matcher | Description | Example Expression | Matches/Not Matches |
---|---|---|---|
^ | The beginning of a line | ^Hello | Matches lines starting with “Hello” |
$ | The end of a line | world$ | Matches lines ending with “world” |
\b | A word boundary | \bcat\b | Matches the whole word “cat” Does not match “catch” or “category” |
\B | A non-word boundary | \Bcat\B | Matches “catch” or “category” Does not match the whole word “cat” |
\A | The beginning of the input | \AHello | Matches strings starting with “Hello” |
\G | The end of the previous match | \Gword | Matches “word” immediately after the previous match |
\Z | The end of the input (excluding final line terminator) | world\Z | Matches “world” at the end of the input string |
\z | The end of the input | world\z | Matches “world” only at the very end of the input string |
2. Regex Boundary Matcher Example
The following Java code example demonstrates the use of each boundary matcher symbol in regular expressions.
The Pattern.MULTILINE
flag is used to enable multiline mode for the regex patterns.
import java.util.regex.*;
public class BoundaryMatcherExample {
public static void main(String[] args) {
// Define a multiline string using text block
String input = """
Hello world
Goodbye world
Catch a cat
catamaran
""";
// Define a StringBuilder to capture the output
StringBuilder output = new StringBuilder();
// ^ - Beginning of Line
Pattern beginningPattern = Pattern.compile("^Hello", Pattern.MULTILINE);
Matcher beginningMatcher = beginningPattern.matcher(input);
while (beginningMatcher.find()) {
output.append("Match found (Beginning of Line): ").append(beginningMatcher.group()).append("\n");
}
// $ - End of Line
Pattern endPattern = Pattern.compile("world$", Pattern.MULTILINE);
Matcher endMatcher = endPattern.matcher(input);
while (endMatcher.find()) {
output.append("Match found (End of Line): ").append(endMatcher.group()).append("\n");
}
// \b - Word Boundary
Pattern wordBoundaryPattern = Pattern.compile("\\bcat\\b", Pattern.MULTILINE);
Matcher wordBoundaryMatcher = wordBoundaryPattern.matcher(input);
while (wordBoundaryMatcher.find()) {
output.append("Match found (Word Boundary): ").append(wordBoundaryMatcher.group()).append("\n");
}
// \B - Non-Word Boundary
Pattern nonWordBoundaryPattern = Pattern.compile("\\Bcat\\B", Pattern.MULTILINE);
Matcher nonWordBoundaryMatcher = nonWordBoundaryPattern.matcher(input);
while (nonWordBoundaryMatcher.find()) {
output.append("Match found (Non-Word Boundary): ").append(nonWordBoundaryMatcher.group()).append("\n");
}
// \A - Beginning of Input
Pattern beginningInputPattern = Pattern.compile("\\AHello");
Matcher beginningInputMatcher = beginningInputPattern.matcher(input);
while (beginningInputMatcher.find()) {
output.append("Match found (Beginning of Input): ").append(beginningInputMatcher.group()).append("\n");
}
// \G - End of Previous Match
Pattern endPreviousPattern = Pattern.compile("\\Goo");
Matcher endPreviousMatcher = endPreviousPattern.matcher(input);
while (endPreviousMatcher.find()) {
output.append("Match found (End of Previous Match): ").append(endPreviousMatcher.group()).append("\n");
}
// \Z - End of Input (excluding final line terminator)
Pattern endInputPattern = Pattern.compile("world\\Z", Pattern.MULTILINE);
Matcher endInputMatcher = endInputPattern.matcher(input);
while (endInputMatcher.find()) {
output.append("Match found (End of Input excluding final line terminator): ").append(endInputMatcher.group()).append("\n");
}
// \z - End of Input
Pattern endInputAbsolutePattern = Pattern.compile("world\\z", Pattern.MULTILINE);
Matcher endInputAbsoluteMatcher = endInputAbsolutePattern.matcher(input);
while (endInputAbsoluteMatcher.find()) {
output.append("Match found (End of Input): ").append(endInputAbsoluteMatcher.group()).append("\n");
}
// Print the captured output
System.out.println(output.toString());
}
}
The program output:
Match found (Beginning of Line): Hello
Match found (End of Line): world
Match found (Word Boundary): cat
Match found (Non-Word Boundary): cat
Match found (Non-Word Boundary): cat
Match found (Beginning of Input): Hello
Match found (End of Previous Match): o
Match found (End of Previous Match): o
Match found (End of Input excluding final line terminator): world
Match found (End of Input): world
This output indicates where each match was found in the multiline input string, according to the corresponding regex pattern.
Let me know your thoughts on this Java regex word boundary example.
Happy Learning !!
References: Java regex docs
Comments