Sometimes we have a requirement where we have to filter out lines from logs, which start from certain word OR end with certain word. In this Java regex word boundary tutorial, we will learn to create regex to filter out lines which either start or end with a certain word.
Table of Contents 1. Boundary matchers 2. Match word at the start of content 3. Match word at the end of content 4. Match word at the start of line 5. Match word at the end of line
1. Boundary matchers
Boundary macthers help to find a particular word, but only if it appears at the beginning or end of a line. They do not match any characters. Instead, they match at certain positions, effectively anchoring the regular expression match at those positions.
The following table lists and explains all the boundary matchers.
Boundary token | Description |
---|---|
^ |
The beginning of a line |
$ |
The end of a line |
\b |
A word boundary |
\B |
A non-word boundary |
\A |
The beginning of the input |
\G |
The end of the previous match |
\Z |
The end of the input but for the final terminator, if any |
\z |
The end of the input |
2. Java regex word boundary – Match word at the start of content
The anchor "\A"
always matches at the very start of the whole text, before the first character. That is the only place where it matches. Place "\A"
at the start of your regular expression to test whether the content begins with the text you want to match.
The "A"
must be uppercase. Alternatively, you can use "^"
as well.
^wordToSearch OR \AwordToSearch
String content = "begin here to start, and go there to end\n" + "come here to begin, and end there to finish\n" + "begin here to start, and go there to end"; String regex = "^begin"; //OR //String regex = "\\Abegin"; Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher(content); while (matcher.find()) { System.out.print("Start index: " + matcher.start()); System.out.print(" End index: " + matcher.end() + " "); System.out.println(matcher.group()); } Output: Start index: 0 End index: 5 begin
3. Java regex word boundary – Match word at the end of content
The anchors "\Z"
and "\z"
always match at the very end of the content, after the last character. Place "\Z"
or "\z"
at the end of your regular expression to test whether the content ends with the text you want to match.
Alternatively, you can use "$"
as well.
wordToSearch$ OR wordToSearch\Z
String content = "begin here to start, and go there to end\n" + "come here to begin, and end there to finish\n" + "begin here to start, and go there to end"; String regex = "end$"; String regex = "end\\Z"; Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher(content); while (matcher.find()) { System.out.print("Start index: " + matcher.start()); System.out.print(" End index: " + matcher.end() + " "); System.out.println(matcher.group()); } Output: Start index: 122 End index: 125 end
4. Java regex word boundary – Match word at the start of line
You can use "(?m)"
to tun on “multi-line” mode to match a word at start of every time.
(?m)^wordToSearch
String content = "begin here to start, and go there to end\n" + "come here to begin, and end there to finish\n" + "begin here to start, and go there to end"; String regex = "(?m)^begin"; Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher(content); while (matcher.find()) { System.out.print("Start index: " + matcher.start()); System.out.print(" End index: " + matcher.end() + " "); System.out.println(matcher.group()); } Output: Start index: 0 End index: 5 begin Start index: 85 End index: 90 begin
5. Java regex word boundary – Match word at the end of line
You can use "(?m)"
to tun on “multi-line” mode to match a word at end of every time.
(?m)wordToSearch$
String content = "begin here to start, and go there to end\n" + "come here to begin, and end there to finish\n" + "begin here to start, and go there to end"; String regex = "(?m)end$"; Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher(content); while (matcher.find()) { System.out.print("Start index: " + matcher.start()); System.out.print(" End index: " + matcher.end() + " "); System.out.println(matcher.group()); } Output: Start index: 37 End index: 40 end Start index: 122 End index: 125 end
Let me know of your thoughts on this Java regex word boundary example.
Happy Learning !!
References:
Leave a Reply