In this Java XPath tutorial, we will learn what is XPath library, what are XPath data types and learn to create XPath expression syntax to retrieve information from an XML file or document. This information can be XML nodes or XML attributes or even comments as well.
We will use this XML in evaluating various XPath expressions in this tutorial.
<?xml version="1.0" encoding="utf-8" ?>
<inventory>
<!--Test is test comment-->
<book year="2000">
<title>Snow Crash</title>
<author>Neal Stephenson</author>
<publisher>Spectra</publisher>
<isbn>0553380958</isbn>
<price>14.95</price>
</book>
<book year="2005">
<title>Burning Tower</title>
<author>Larry Niven</author>
<author>Jerry Pournelle</author>
<publisher>Pocket</publisher>
<isbn>0743416910</isbn>
<price>5.99</price>
</book>
<book year="1995">
<title>Zodiac</title>
<author>Neal Stephenson</author>
<publisher>Spectra</publisher>
<isbn>0553573862</isbn>
<price>7.50</price>
</book>
</inventory>
The following table lists a few XPath expressions for quick reference:
Description | XPath Expression |
---|---|
Get book titles written after 2001 | //book[@year>2001]/title/text() |
Get book titles cheaper than 8 dollars | //book[price<8]/title/text() |
Get the title of the first book | //book[1]/title/text() |
Get all writers | //book/author/text() |
Count all books titles | count(//book/title) |
Get book titles with writer name starting with Neal | //book[starts-with(author,'Neal')] |
Get book titles with writer name containing Niven | //book[contains(author,'Niven')] |
Get count of book titles written by Neal Stephenson | count(//book[author='Neal Stephenson']) |
Get book titles written by Neal Stephenson | //book[author='Neal Stephenson']/title/text() |
1. What is XPath?
XPath is a syntax used to describe parts of an XML document. With XPath, we can refer to an element, any attribute of the elements, all specific elements that contain some text, and many other combinations. An XSLT stylesheet uses XPath expressions in the match and selects attributes of various elements to indicate how a document should be transformed.
XPath can sometimes be useful while testing web services using XML for retrieving and validating the API responses.
XPath uses language syntax much similar to what we already know. The syntax is a mix of basic programming language expressions (wild cards such as $x*6
) and Unix-like path expressions (such as /inventory/author
).
In addition to the basic syntax, XPath provides a set of useful functions (such as count()
or contains(
), much similar to utility functions calls) that allows searching for various data fragments inside the document.
2. XPath Data Model
XPath views an XML document as a tree of nodes. This tree is very similar to a Document Object Model i.e. DOM tree, so if you’re familiar with the DOM, you will easily get some understanding of how to build basic XPath expressions.
There are seven kinds of nodes in the XPath data model:
- The root node (Only one per document)
- Element nodes
- Attribute nodes
- Text nodes
- Comment nodes
- Processing instruction nodes
- Namespace nodes
2.1. Root Node
The root node is the XPath node that contains the entire document. In our example, the root node contains the <inventory> element. In an XPath expression, the root node is specified with a single slash ('/'
).
2.2. Element Nodes
Every element in the original XML document is represented by an XPath element node.
For example in our sample XML below are element nodes.
book
title
author
publisher
isbn
price
2.3. Attribute Nodes
At a minimum, an element node is the parent of one attribute node for each attribute in the XML source document. These nodes are used to define the features about a particular element node.
For example in our XML fragment “year
” is an attribute node.
2.4. Text Nodes
Text nodes are refreshingly simple. They contain text from an element. If the original text in the XML document contained entity or character references, they are resolved before the XPath text node is created.
The text node is text, pure and simple. A text node is required to contain as much text as possible. Remember that the next or previous node of a text node can’t be another text node.
For example, all values in our XML fragment are text nodes e.g. “Snow Crash
” and “Neal Stephenson
“.
2.5. Comment Nodes
A comment node is also very simple—it contains some text. Every comment in the source document becomes a comment node. The text of the comment node contains everything inside the comment, except the opening <!-- and the closing -->
.
For example:
<!--Test is test comment-->
2.6. Processing Instruction Nodes
A processing instruction node has two parts, a name (returned by the name() function) and a string value. The string value is everything after the name <?xml
, including white space, but not including the ?>
that closes the processing instruction.
For example:
<?xml version="1.0" encoding="utf-8"?>
2.7. Namespace Nodes
Namespace nodes are almost never used in XSLT style sheets; they exist primarily for the XSLT processor’s benefit.
Remember that the declaration of a namespace (such as xmlns:auth=”http://www.authors.net”), even though it is technically an attribute in the XML source, becomes a namespace node, not an attribute node.
3. XPath Data Types
In Java, an XPath expression may return one of the following data types:
- node-set – Represents a set of nodes. The set can be empty or contain any number of nodes.
- node (Java supports it) – Represents a single node. This can be empty or can contain any number of child nodes.
- boolean – Represents the value true or false. Be aware that the true or false strings have no special meaning or value in XPath; see Section 4.2.1.2 in Chapter 4 for a more detailed discussion of boolean values.
- number – Represents a floating-point number. All numbers in XPath and XSLT are implemented as floating-point numbers; the integer (or int) datatype does not exist in XPath and XSLT. Specifically, all numbers are implemented as IEEE 754 floating-point numbers, the same standard used by the Java float and double primitive types. In addition to ordinary numbers, there are five special values for numbers: positive and negative infinity, positive and negative zero, and NaN, the special symbol for anything that is not a number.
- string – Represents zero or more characters, as defined in the XML specification.
These data types are usually simple, except for node-sets, converting between types is usually straightforward. We won’t discuss these data types in any more detail here; instead, we’ll discuss data types and conversions as we need them to do specific tasks.
4. XPath Syntax
XPath uses UNIX and regex kind syntax.
4.1. Select nodes with xpath
Expression | Description |
---|---|
nodename | Selects all nodes with the name “nodename“ |
/ | Selects from the root node |
// | Selects nodes in the document from the current node that match the selection no matter where they are |
. | Selects the current node |
.. | Selects the parent of the current node |
@ | Selects attributes |
4.2. Using Predicates with XPath
Predicates are used to find a specific node or a node that contains a specific value. Predicates are always embedded in square brackets.
We will learn how to use them in the next section.
4.3. Reaching Unknown Nodes with XPath
XPath wildcards can be used to select unknown XML elements.
Wildcard | Description |
---|---|
* | Matches any element node |
@* | Matches any attribute node |
node() | Matches any node of any kind |
4.4. XPath Axes
An axis defines a node-set relative to the current node. The following are axes defined by default.
AxisName | Result |
---|---|
ancestor | Selects all ancestors (parent, grandparent, etc.) of the current node |
ancestor-or-self | Selects all ancestors (parent, grandparent, etc.) of the current node and the current node itself |
attribute | Selects all attributes of the current node |
child | Selects all children of the current node |
descendant | Selects all descendants (children, grandchildren, etc.) of the current node |
descendant-or-self | Selects all descendants (children, grandchildren, etc.) of the current node and the current node itself |
following | Selects everything in the document after the closing tag of the current node |
following-sibling | Selects all siblings after the current node |
namespace | Selects all namespace nodes of the current node |
parent | Selects the parent of the current node |
preceding | Selects all nodes that appear before the current node in the document, except ancestors, attribute nodes and namespace nodes |
preceding-sibling | Selects all siblings before the current node |
self | Selects the current node |
4.5. XPath Operators
Below is a list of xpath operators that can be used in XPath expressions:
Operator | Description | Example | Return value |
---|---|---|---|
| | Computes two node-sets | //book | //cd | Returns a node-set with all book and cd elements |
+ | Addition | 6 + 4 | 10 |
- | Subtraction | 6 – 4 | 2 |
* | Multiplication | 6 * 4 | 24 |
div | Division | 8 div 4 | 2 |
= | Equal | price=9.80 | true if price is 9.80 false if price is 9.90 |
!= | Not equal | price!=9.80 | true if price is 9.90 false if price is 9.80 |
< | Less than | price<9.80 | true if price is 9.00 false if price is 9.80 |
< = | Less than or equal to | price< =9.80 | true if price is 9.00 false if price is 9.90 |
> | Greater than | price>9.80 | true if price is 9.90 false if price is 9.80 |
>= | Greater than or equal to | price>=9.80 | true if price is 9.90 false if price is 9.70 |
or | or | price=9.80 or price=9.70 | true if price is 9.80 false if price is 9.50 |
and | and | price>9.00 and price<9.90 | true if price is 9.80 false if price is 8.50 |
mod | Modulus (division remainder) | 5 mod 2 | 1 |
5. How to Evaluate XPath Expressions
We begin with creating a DOM model for the XML document as follows:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true); // never forget this!
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("inventory.xml");
Next, we create an instance of XPathExpression as follows.
XPathFactory xpathfactory = XPathFactory.newInstance();
XPath xpath = xpathfactory.newXPath();
XPathExpression expr = xpath.compile("//book[@year>2001]/title/text()");
Finally, we use XPathExpression.evaluate() method to obtain the result of matching the expression with Document model.
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
6. Complete Example
The following XPathExample class evaluates the XPath expressions shown at the start of the article, and prints their output.
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
public class XPathExample {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true); // never forget this!
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("inventory.xml");
//Create XPath
XPathFactory xpathfactory = XPathFactory.newInstance();
XPath xpath = xpathfactory.newXPath();
System.out.println("1) Get book titles written after 2001");
XPathExpression expr = xpath.compile("//book[@year>2001]/title/text()");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
System.out.println("2) Get book titles written before 2001");
expr = xpath.compile("//book[@year<2001]/title/text()");
result = expr.evaluate(doc, XPathConstants.NODESET);
nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
System.out.println("3) Get book titles cheaper than 8 dollars");
expr = xpath.compile("//book[price<8]/title/text()");
result = expr.evaluate(doc, XPathConstants.NODESET);
nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
System.out.println("4) Get book titles costlier than 8 dollars");
expr = xpath.compile("//book[price>8]/title/text()");
result = expr.evaluate(doc, XPathConstants.NODESET);
nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
System.out.println("5) Get book titles added in first node");
expr = xpath.compile("//book[1]/title/text()");
result = expr.evaluate(doc, XPathConstants.NODESET);
nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
System.out.println("6) Get book title added in last node");
expr = xpath.compile("//book[last()]/title/text()");
result = expr.evaluate(doc, XPathConstants.NODESET);
nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
System.out.println("7) Get all writers");
expr = xpath.compile("//book/author/text()");
result = expr.evaluate(doc, XPathConstants.NODESET);
nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
System.out.println("8) Count all books titles");
expr = xpath.compile("count(//book/title)");
result = expr.evaluate(doc, XPathConstants.NUMBER);
Double count = (Double) result;
System.out.println(count.intValue());
System.out.println("9) Get book titles with writer name start with Neal");
expr = xpath.compile("//book[starts-with(author,'Neal')]");
result = expr.evaluate(doc, XPathConstants.NODESET);
nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i)
.getChildNodes()
.item(1) //node <title> is on first index
.getTextContent());
}
System.out.println("10) Get book titles with writer name containing Niven");
expr = xpath.compile("//book[contains(author,'Niven')]");
result = expr.evaluate(doc, XPathConstants.NODESET);
nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i)
.getChildNodes()
.item(1) //node <title> is on first index
.getTextContent());
}
System.out.println("11) Get book titles written by Neal Stephenson");
expr = xpath.compile("//book[author='Neal Stephenson']/title/text()");
result = expr.evaluate(doc, XPathConstants.NODESET);
nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getNodeValue());
}
System.out.println("12) Get count of book titles written by Neal Stephenson");
expr = xpath.compile("count(//book[author='Neal Stephenson'])");
result = expr.evaluate(doc, XPathConstants.NUMBER);
count = (Double) result;
System.out.println(count.intValue());
System.out.println("13) Reading comment node");
expr = xpath.compile("//inventory/comment()");
result = expr.evaluate(doc, XPathConstants.STRING);
String comment = (String) result;
System.out.println(comment);
}
}
Program output:
1) Get book titles written after 2001
Burning Tower
2) Get book titles written before 2001
Snow Crash
Zodiac
3) Get book titles cheaper than 8 dollars
Burning Tower
Zodiac
4) Get book titles costlier than 8 dollars
Snow
z
5) Get book titles added in first node
Snow Crash
6) Get book title added in last node
Zodiac
7) Get all writers
Neal Stephenson
Larry Niven
Jerry Pournelle
Neal Stephenson
8) Count all books titles
3
9) Get book titles with writer name start with Neal
Snow Crash
Zodiac
10) Get book titles with writer name containing Niven
Burning Tower
11) Get book titles written by Neal Stephenson
Snow Crash
Zodiac
12) Get count of book titles written by Neal Stephenson
2
13) Reading comment node
Test is test comment
We hope that this xpath tutorial has been informative for you. It will help us in executing xpath with Java. If you have some suggestions then please leave a comment.
Happy Learning !!
Leave a Reply