In this Java XPath tutorial, we will learn what is XPath library, what are XPath data types and learn to create XPath expression syntax to retrieve information from an XML file or document. This information can be XML nodes or XML attributes or even comments as well.
We will use this XML in evaluating various XPath expressions in this tutorial.
<?xml version="1.0" encoding="utf-8" ?>
<inventory>
<!--Test is test comment-->
<book year="2000">
<title>Snow Crash</title>
<author>Neal Stephenson</author>
<publisher>Spectra</publisher>
<isbn>0553380958</isbn>
<price>14.95</price>
</book>
<book year="2005">
<title>Burning Tower</title>
<author>Larry Niven</author>
<author>Jerry Pournelle</author>
<publisher>Pocket</publisher>
<isbn>0743416910</isbn>
<price>5.99</price>
</book>
<book year="1995">
<title>Zodiac</title>
<author>Neal Stephenson</author>
<publisher>Spectra</publisher>
<isbn>0553573862</isbn>
<price>7.50</price>
</book>
</inventory>
The following table lists a few XPath expressions for quick reference:
Description | XPath Expression |
---|---|
Get book titles written after 2001 | //book[@year>2001]/title/text() |
Get book titles cheaper than 8 dollars | //book[price<8]/title/text() |
Get the title of the first book | //book[1]/title/text() |
Get all writers | //book/author/text() |
Count all books titles | count(//book/title) |
Get book titles with writer name starting with Neal | //book[starts-with(author,'Neal')] |
Get book titles with writer name containing Niven | //book[contains(author,'Niven')] |
Get count of book titles written by Neal Stephenson | count(//book[author='Neal Stephenson']) |
Get book titles written by Neal Stephenson | //book[author='Neal Stephenson']/title/text() |
1. What is XPath?
XPath is a syntax used to describe parts of an XML document. With XPath, we can refer to an element, any attribute of the elements, all specific elements that contain some text, and many other combinations. An XSLT stylesheet uses XPath expressions in the match and selects attributes of various elements to indicate how a document should be transformed.
XPath can sometimes be useful while testing web services using XML for retrieving and validating the API responses.
XPath uses language syntax much similar to what we already know. The syntax is a mix of basic programming language expressions (wild cards such as $x*6
) and Unix-like path expressions (such as /inventory/author
).
In addition to the basic syntax, XPath provides a set of useful functions (such as count()
or contains(
), much similar to utility functions calls) that allows searching for various data fragments inside the document.
2. XPath Data Model
XPath views an XML document as a tree of nodes. This tree is very similar to a Document Object Model i.e. DOM tree, so if you’re familiar with the DOM, you will easily get some understanding of how to build basic XPath expressions.
There are seven kinds of nodes in the XPath data model:
- The root node (Only one per document)
- Element nodes
- Attribute nodes
- Text nodes
- Comment nodes
- Processing instruction nodes
- Namespace nodes
2.1. Root Node
The root node is the XPath node that contains the entire document. In our example, the root node contains the <inventory> element. In an XPath expression, the root node is specified with a single slash ('/'
).
2.2. Element Nodes
Every element in the original XML document is represented by an XPath element node.
For example in our sample XML below are element nodes.
book
title
author
publisher
isbn
price
2.3. Attribute Nodes
At a minimum, an element node is the parent of one attribute node for each attribute in the XML source document. These nodes are used to define the features about a particular element node.
For example in our XML fragment “year
” is an attribute node.
2.4. Text Nodes
Text nodes are refreshingly simple. They contain text from an element. If the original text in the XML document contained entity or character references, they are resolved before the XPath text node is created.
The text node is text, pure and simple. A text node is required to contain as much text as possible. Remember that the next or previous node of a text node can’t be another text node.
For example, all values in our XML fragment are text nodes e.g. “Snow Crash
” and “Neal Stephenson
“.
2.5. Comment Nodes
A comment node is also very simple—it contains some text. Every comment in the source document becomes a comment node. The text of the comment node contains everything inside the comment, except the opening <!-- and the closing -->
.
For example:
<!--Test is test comment-->
2.6. Processing Instruction Nodes
A processing instruction node has two parts, a name (returned by the name() function) and a string value. The string value is everything after the name <?xml
, including white space, but not including the ?>
that closes the processing instruction.
For example:
<?xml version="1.0" encoding="utf-8"?>
2.7. Namespace Nodes
Namespace nodes are almost never used in XSLT style sheets; they exist primarily for the XSLT processor’s benefit.
Remember that the declaration of a namespace (such as xmlns:auth=”http://www.authors.net”), even though it is technically an attribute in the XML source, becomes a namespace node, not an attribute node.
3. XPath Data Types
In Java, an XPath expression may return one of the following data types:
- node-set – Represents a set of nodes. The set can be empty or contain any number of nodes.
- node (Java supports it) – Represents a single node. This can be empty or can contain any number of child nodes.
- boolean – Represents the value true or false. Be aware that the true or false strings have no special meaning or value in XPath; see Section 4.2.1.2 in Chapter 4 for a more detailed discussion of boolean values.
- number – Represents a floating-point number. All numbers in XPath and XSLT are implemented as floating-point numbers; the integer (or int) datatype does not exist in XPath and XSLT. Specifically, all numbers are implemented as IEEE 754 floating-point numbers, the same standard used by the Java float and double primitive types. In addition to ordinary numbers, there are five special values for numbers: positive and negative infinity, positive and negative zero, and NaN, the special symbol for anything that is not a number.
- string – Represents zero or more characters, as defined in the XML specification.
These data types are usually simple, except for node-sets, converting between types is usually straightforward. We won’t discuss these data types in any more detail here; instead, we’ll discuss data types and conversions as we need them to do specific tasks.
4. XPath Syntax
XPath uses UNIX and regex kind syntax.
4.1. Select nodes with xpath
Expression | Description |
---|---|
nodename | Selects all nodes with the name “nodename“ |
/ | Selects from the root node |
// | Selects nodes in the document from the current node that match the selection no matter where they are |
. | Selects the current node |
.. | Selects the parent of the current node |
@ | Selects attributes |
4.2. Using Predicates with XPath
Predicates are used to find a specific node or a node that contains a specific value. Predicates are always embedded in square brackets.
We will learn how to use them in the next section.
4.3. Reaching Unknown Nodes with XPath
XPath wildcards can be used to select unknown XML elements.
Wildcard | Description |
---|---|
* | Matches any element node |
@* | Matches any attribute node |
node() | Matches any node of any kind |
4.4. XPath Axes
An axis defines a node-set relative to the current node. The following are axes defined by default.
AxisName | Result |
---|---|
ancestor | Selects all ancestors (parent, grandparent, etc.) of the current node |
ancestor-or-self | Selects all ancestors (parent, grandparent, etc.) of the current node and the current node itself |
attribute | Selects all attributes of the current node |
child | Selects all children of the current node |
descendant | Selects all descendants (children, grandchildren, etc.) of the current node |
descendant-or-self | Selects all descendants (children, grandchildren, etc.) of the current node and the current node itself |
following | Selects everything in the document after the closing tag of the current node |
following-sibling | Selects all siblings after the current node |
namespace | Selects all namespace nodes of the current node |
parent | Selects the parent of the current node |
preceding | Selects all nodes that appear before the current node in the document, except ancestors, attribute nodes and namespace nodes |
preceding-sibling | Selects all siblings before the current node |
self | Selects the current node |
4.5. XPath Operators
Below is a list of xpath operators that can be used in XPath expressions:
Operator | Description | Example | Return value |
---|---|---|---|
| | Computes two node-sets | //book | //cd | Returns a node-set with all book and cd elements |
+ | Addition | 6 + 4 | 10 |
- | Subtraction | 6 – 4 | 2 |
* | Multiplication | 6 * 4 | 24 |
div | Division | 8 div 4 | 2 |
= | Equal | price=9.80 | true if price is 9.80 false if price is 9.90 |
!= | Not equal | price!=9.80 | true if price is 9.90 false if price is 9.80 |
< | Less than | price<9.80 | true if price is 9.00 false if price is 9.80 |
< = | Less than or equal to | price< =9.80 | true if price is 9.00 false if price is 9.90 |
> | Greater than | price>9.80 | true if price is 9.90 false if price is 9.80 |
>= | Greater than or equal to | price>=9.80 | true if price is 9.90 false if price is 9.70 |
or | or | price=9.80 or price=9.70 | true if price is 9.80 false if price is 9.50 |
and | and | price>9.00 and price<9.90 | true if price is 9.80 false if price is 8.50 |
mod | Modulus (division remainder) | 5 mod 2 | 1 |
5. How to Evaluate XPath Expressions
We begin with creating a DOM model for the XML document as follows:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true); // never forget this!
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("inventory.xml");
Next, we create an instance of XPathExpression as follows.
XPathFactory xpathfactory = XPathFactory.newInstance();
XPath xpath = xpathfactory.newXPath();
XPathExpression expr = xpath.compile("//book[@year>2001]/title/text()");
Finally, we use XPathExpression.evaluate() method to obtain the result of matching the expression with Document model.
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
6. Complete Example
The following XPathExample class evaluates the XPath expressions shown at the start of the article, and prints their output.
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
public class XPathExample {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true); // never forget this!
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("inventory.xml");
//Create XPath
XPathFactory xpathfactory = XPathFactory.newInstance();
XPath xpath = xpathfactory.newXPath();
System.out.println("1) Get book titles written after 2001");
XPathExpression expr = xpath.compile("//book[@year>2001]/title/text()");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
System.out.println("2) Get book titles written before 2001");
expr = xpath.compile("//book[@year<2001]/title/text()");
result = expr.evaluate(doc, XPathConstants.NODESET);
nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
System.out.println("3) Get book titles cheaper than 8 dollars");
expr = xpath.compile("//book[price<8]/title/text()");
result = expr.evaluate(doc, XPathConstants.NODESET);
nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
System.out.println("4) Get book titles costlier than 8 dollars");
expr = xpath.compile("//book[price>8]/title/text()");
result = expr.evaluate(doc, XPathConstants.NODESET);
nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
System.out.println("5) Get book titles added in first node");
expr = xpath.compile("//book[1]/title/text()");
result = expr.evaluate(doc, XPathConstants.NODESET);
nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
System.out.println("6) Get book title added in last node");
expr = xpath.compile("//book[last()]/title/text()");
result = expr.evaluate(doc, XPathConstants.NODESET);
nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
System.out.println("7) Get all writers");
expr = xpath.compile("//book/author/text()");
result = expr.evaluate(doc, XPathConstants.NODESET);
nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
System.out.println("8) Count all books titles");
expr = xpath.compile("count(//book/title)");
result = expr.evaluate(doc, XPathConstants.NUMBER);
Double count = (Double) result;
System.out.println(count.intValue());
System.out.println("9) Get book titles with writer name start with Neal");
expr = xpath.compile("//book[starts-with(author,'Neal')]");
result = expr.evaluate(doc, XPathConstants.NODESET);
nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i)
.getChildNodes()
.item(1) //node <title> is on first index
.getTextContent());
}
System.out.println("10) Get book titles with writer name containing Niven");
expr = xpath.compile("//book[contains(author,'Niven')]");
result = expr.evaluate(doc, XPathConstants.NODESET);
nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i)
.getChildNodes()
.item(1) //node <title> is on first index
.getTextContent());
}
System.out.println("11) Get book titles written by Neal Stephenson");
expr = xpath.compile("//book[author='Neal Stephenson']/title/text()");
result = expr.evaluate(doc, XPathConstants.NODESET);
nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
System.out.println("12) Get count of book titles written by Neal Stephenson");
expr = xpath.compile("count(//book[author='Neal Stephenson'])");
result = expr.evaluate(doc, XPathConstants.NUMBER);
count = (Double) result;
System.out.println(count.intValue());
System.out.println("13) Reading comment node");
expr = xpath.compile("//inventory/comment()");
result = expr.evaluate(doc, XPathConstants.STRING);
String comment = (String) result;
System.out.println(comment);
}
}
Program output:
1) Get book titles written after 2001
Burning Tower
2) Get book titles written before 2001
Snow Crash
Zodiac
3) Get book titles cheaper than 8 dollars
Burning Tower
Zodiac
4) Get book titles costlier than 8 dollars
Snow
z
5) Get book titles added in first node
Snow Crash
6) Get book title added in last node
Zodiac
7) Get all writers
Neal Stephenson
Larry Niven
Jerry Pournelle
Neal Stephenson
8) Count all books titles
3
9) Get book titles with writer name start with Neal
Snow Crash
Zodiac
10) Get book titles with writer name containing Niven
Burning Tower
11) Get book titles written by Neal Stephenson
Snow Crash
Zodiac
12) Get count of book titles written by Neal Stephenson
2
13) Reading comment node
Test is test comment
We hope that this xpath tutorial has been informative for you. It will help us in executing xpath with Java. If you have some suggestions then please leave a comment.
Happy Learning !!
I have one xml file i am reading some of data from xml but i have to use substring method in it.
XML:
Test1
TEST2 Complete
So as per above example i have wrote code in java and succesfully read TEST1 node value.
But i want to read Test2 and need only Complete value. how can i use substring while parsing xml node using java
Can please can you help us for provide correct xpath using substring method of xml.
Thanks!
Hi Lokesh,
This post has been really helpful! Thank you!
I had a question. Let’s say, if I want to query for a specific year and if inside that year, publisher=xyz, how will the expression look like?
For ex, here is my XML. I am interested in querying for id=0 and ci_id=415bf14c9322c8c08dbbbb0c6c4d4425.
Query_3
4
0
license
415bf14c9322c8c08dbbbb0c6c4d4425
1
interface
4cfcf4e61f350bdba27593ebe64ecaf2
1
interface
4e70e7cbd34836e6ba3b0f0aa7124ac8
Thank you!
Hi, I’m really new in XPath, I would like to know if its possible to write one javax.xml.xpath.XPathExpression for obtain the value of tag?
I have try these expressions, with no results:
String expression= “/autorizacion/comprobante/*/*/factura/infoFactura/identificacionComprador”;
String expression= “/*//identificacionComprador/text()”;
The java function for compiling is this:
Thanks in advance for any help or guidance.
We can not read data inside CDATA using normal xpath expression. The data is plain text – not DOM node so we must be using string functions.
This code works:
Read More : SO Thread
Helpful Blog For Me Thanks For Sharing This!!!!!
hi. i want to traverse the XML file given below and get the ip address and application name stored in array. Help me how to traverse.
App1
App2
App3
App4
App12
App22
App32
App42
Desired output is:
IP address:192.168.10.10
Appliaction running on 192.168.10.10 device:
App1
App2
App3
App4
Similarly for next device
Thanks for the clear explanation. . Its good.
Can you let me know how to return the particular node as xml response back.
From your example xml file, if i pass the year value as “2005” then it should send me the particular node as XML response.
Good Day Lokesh,
your assistance would be highly appreciated.
this is my xml document below :
634.0
12.0
2670.0
3/31/2016 12:00:00 AM
8/31/2018 12:00:00 AM
1350.0
00
i have tried so many ways to try and extract the inner text of the xml but no success, could you please provide a solution to read the above xml.
thank you very much
You will get all information here : https://howtodoinjava.com/java/xml/read-xml-dom-parser-example/
hi lokesh i want to call web service with xpath included in url…could u help me about this that how can i pass
only some part of xml in url..
e.g my url is :: ‘localhost.localdomain’]/deviceconfig/system&element=myneme20.20.20.20
Hi Lokesh,
Is it possible to sort the values using xpath? For example I want to retrieve title of all books in sorted order.
Thanks,
Minu
Hi my doubt is whether we can write a program to find the path of any given XML file.for example you take any XML file and find out all possible paths of the given file.
There is no readymade API for this. You have to iterate through XML tree and find all yourself.
Thanks for wonderful article.Can you tell me the XPATH expression to select all nodes where’ year ‘ attribute is specified .I tried using //book[@year] but it doesnt work
//book/@year
It looks like my example got garbled and lost its tags. The inline tags are meant to be [b] tags
Post you code inside [xml] … [/xml] tags.
Thanks, I will try that. Crossing my fingers that the code shows. The the gist, how do you also get inline tags to show in the results? Suppose one of the examples had inlines, like this
—edited—
And you want the results to look like this
—edited—
at any rate, I think the intent of my question is clear?
Thanks in advance!
So far, I am able to do this much. I know it does not exactly solve your problem, but be sure that I will put more effort onto this.
Very nice! But what about elements with nested elements? What if one of your examples had an inline element like this
Burning MY Tower
Larry Niven
Jerry Pournelle
Pocket
0743416910
5.99
And you wanted output to include the inline like this
//1) Get book titles written after 2001
Burning MY Tower
Can this be done without some complicated serialization? I’ve tired JAXB and gotten lost in complexity. Is there an easier way?
can you explain why we must set namespaceaware(true)?
Am not able to use @Xmlpath in Jaxb.
Please give any suggestions
Can you please provide me the package xml
It’s only class in the package.. :-)
then what was the need of mentioning package xml;
:-) It’s just package name I created for writing the sample code. Usually I copy whole class file code and paste as it is in tutorial as well.
I am able to run your code using ur xml sample..but when I am trying to run with mine xml response the above code is not working, I decoded and found the problem lies with namespace..Can you please help, how to handle namespace in your code
Any update on the namespace issue I am facing.
You should use XPath local-name() like this:
xpath.compile("//*[local-name()='title']/text()");
https://docs.oracle.com/cd/E35413_01/doc.722/e35419/dev_xpath_functions.htm#autoId6
A bit confusing, can you please help me implementing this on my xpath, find below my xpath :
//ns1:inventory/book[1]/title/text()
Hey just solved it, thanks a lot for your help, ultimately I did it with namespace.
I would like to get in touch with you, if I require any help in future, can u please drop me your mail id here, or else you can drop me a mail at basujaydeep[at]yahoo[dot]com
Post your question here if you need my help anytime. It will help other’s as well.
Thank you very much, very good.
I noticed that it is a small step only to manipulation of XML using XPath expressions. I tried setTextContext after navigating to a node and it worked as expected!
Hi,
if you like, here is an excerpt from my code
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new InputSource(new StringReader(xmlInput)));
XPathFactory xpathfactory = XPathFactory.newInstance();
XPath xpath = xpathfactory.newXPath();
XPathExpression xpathExpression = xpath.compile(xpathDefinition);
Node node = (Node) xpathExpression.evaluate(doc, XPathConstants.NODE);
node.setTextContent(value);
DOMSource domSource = new DOMSource(doc);
StringWriter writer = new StringWriter();
StreamResult res = new StreamResult(writer);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.transform(domSource, res);
String xmlOutput = writer.toString();
I had the need to transform the document to an XML string …
Cheers
Thorsten