Java XPath Example – XPath Tutorial

In this Java XPath tutorial, we will learn what is XPath library, what are XPath data types and learn to create XPath expression syntax to retrieve information from XML file or document. This information can be XML nodes or XML attributes or even comments as well.

Table of Contents

1. What is XPath?
2. XPath Data Model
3. XPath Data Types
4. XPath Syntax
5. XPath Expressions
6. Recommended reading

We will use this XML in running various XPath examples in this tutorial.

<?xml version="1.0" encoding="utf-8" ?>
<inventory>
	<!--Test is test comment-->
		<book year="2000">
		<title>Snow Crash</title>
		<author>Neal Stephenson</author>
		<publisher>Spectra</publisher>
		<isbn>0553380958</isbn>
		<price>14.95</price>
	</book>
	<book year="2005">
		<title>Burning Tower</title>
		<author>Larry Niven</author>
		<author>Jerry Pournelle</author>
		<publisher>Pocket</publisher>
		<isbn>0743416910</isbn>
		<price>5.99</price>
	</book>
	<book year="1995">
		<title>Zodiac</title>
		<author>Neal Stephenson</author>
		<publisher>Spectra</publisher>
		<isbn>0553573862</isbn>
		<price>7.50</price>
	</book>
</inventory>

1. What is XPath

XPath is a syntax used to describe parts of an XML document. With XPath, you can refer to the first element, any attribute of the elements, all specific elements that contain the some text, and many other variations. An XSLT style-sheet uses XPath expressions in the match and select attributes of various elements to indicate how a document should be transformed.

XPath can be sometimes useful while testing web services using XML for sending request and receiving response.

XPath uses language syntax much similar to what we already know. The syntax is a mix of basic programming language expressions (wild cards such as $x*6) and Unix-like path expressions (such as /inventory/author).

In addition to the basic syntax, XPath provides a set of useful functions (such as count() or contains(), much similar to utility functions calls) that allow you to search for various data fragments inside the document.

2. XPath Data Model

XPath views an XML document as a tree of nodes. This tree is very similar to a Document Object Model i.e. DOM tree, so if you’re familiar with the DOM, you will easily get some understanding of how to build basic XPath expressions.

There are seven kinds of nodes in the XPath data model:

  1. The root node (Only one per document)
  2. Element nodes
  3. Attribute nodes
  4. Text nodes
  5. Comment nodes
  6. Processing instruction nodes
  7. Namespace nodes

2.1. Root Node

The root node is the XPath node that contains the entire document. In our example, the root node contains the <inventory> element. In an XPath expression, the root node is specified with a single slash ('/').

2.2. Element Nodes

Every element in the original XML document is represented by an XPath element node.

For example in our sample XML below are element nodes.

  • book
  • title
  • author
  • publisher
  • isbn
  • price

2.3. Attribute Nodes

At a minimum, an element node is the parent of one attribute node for each attribute in the XML source document. These nodes are used to define the features about a particular element node.

For example in our XML fragment “year” is an attribute node.

2.4. Text Nodes

Text nodes are refreshingly simple. They contain text from an element. If the original text in the XML document contained entity or character references, they are resolved before the XPath text node is created.

The text node is text, pure and simple. A text node is required to contain as much text as possible. Remember that the next or previous node of a text node can’t be another text node.

For example, all values in our XML fragment are text nodes e.g. “Snow Crash” and “Neal Stephenson“.

2.5. Comment Nodes

A comment node is also very simple—it contains some text. Every comment in the source document becomes a comment node. The text of the comment node contains everything inside the comment, except the opening <!-- and the closing -->.

For example:

<!--Test is test comment-->

2.6. Processing Instruction Nodes

A processing instruction node has two parts, a name (returned by the name() function) and a string value. The string value is everything after the name <?xml, including white space, but not including the ?> that closes the processing instruction.

For example:

<?xml version="1.0" encoding="utf-8"?>

2.7. Namespace Nodes

Namespace nodes are almost never used in XSLT style sheets; they exist primarily for the XSLT processor’s benefit.

Remember that the declaration of a namespace (such as xmlns:auth=”http://www.authors.net”), even though it is technically an attribute in the XML source, becomes a namespace node, not an attribute node.

3. XPath Data Types

In Java, an XPath expression may return one of following data types:

  1. node-set – Represents a set of nodes. The set can be empty, or it can contain any number of nodes.
  2. node (Java support it) – Represents a single node. This can be empty, or it can contain any number of child nodes.
  3. boolean – Represents the value true or false. Be aware that the true or false strings have no special meaning or value in XPath; see Section 4.2.1.2 in Chapter 4 for a more detailed discussion of boolean values.
  4. number – Represents a floating-point number. All numbers in XPath and XSLT are implemented as floating-point numbers; the integer (or int) datatype does not exist in XPath and XSLT. Specifically, all numbers are implemented as IEEE 754 floatingpoint numbers, the same standard used by the Java float and double primitive types. In addition to ordinary numbers, there are five special values for numbers: positive and negative infinity, positive and negative zero, and NaN, the special symbol for anything that is not a number.
  5. string – Represents zero or more characters, as defined in the XML specification.

These datatypes are usually simple, and with the exception of node-sets, converting between types is usually straightforward. We won’t discuss these datatypes in any more detail here; instead, we’ll discuss datatypes and conversions as we need them to do specific tasks.

4. XPath Syntax

XPath uses UNIX and regex kind syntax.

4.1. Select nodes with xpath

ExpressionDescription
nodenameSelects all nodes with the name “nodename
/Selects from the root node
//Selects nodes in the document from the current node that match the selection no matter where they are
.Selects the current node
..Selects the parent of the current node
@Selects attributes

4.2. Use predicates with xpath

Predicates are used to find a specific node or a node that contains a specific value. Predicates are always embedded in square brackets.
We will learn how to use them in the next section.

4.3. Reaching unknown nodes with xpath

XPath wildcards can be used to select unknown XML elements.

WildcardDescription
*Matches any element node
@*Matches any attribute node
node()Matches any node of any kind

4.4. XPath Axes

An axis defines a node-set relative to the current node. Following are axes defined by default.

AxisNameResult
ancestorSelects all ancestors (parent, grandparent, etc.) of the current node
ancestor-or-selfSelects all ancestors (parent, grandparent, etc.) of the current node and the current node itself
attributeSelects all attributes of the current node
childSelects all children of the current node
descendantSelects all descendants (children, grandchildren, etc.) of the current node
descendant-or-selfSelects all descendants (children, grandchildren, etc.) of the current node and the current node itself
followingSelects everything in the document after the closing tag of the current node
following-siblingSelects all siblings after the current node
namespaceSelects all namespace nodes of the current node
parentSelects the parent of the current node
precedingSelects all nodes that appear before the current node in the document, except ancestors, attribute nodes and namespace nodes
preceding-siblingSelects all siblings before the current node
selfSelects the current node

4.5. XPath Operators

Below is a list of xpath operators that can be used in XPath expressions:

OperatorDescriptionExampleReturn value
|Computes two node-sets//book | //cdReturns a node-set with all book and cd elements
+Addition6 + 410
-Subtraction6 – 42
*Multiplication6 * 424
divDivision8 div 42
=Equalprice=9.80true if price is 9.80
false if price is 9.90
!=Not equalprice!=9.80true if price is 9.90
false if price is 9.80
< Less thanprice<9.80true if price is 9.00
false if price is 9.80
< =Less than or equal toprice< =9.80true if price is 9.00
false if price is 9.90
>Greater thanprice>9.80true if price is 9.90
false if price is 9.80
>=Greater than or equal toprice>=9.80true if price is 9.90
false if price is 9.70
ororprice=9.80 or price=9.70true if price is 9.80
false if price is 9.50
andandprice>9.00 and price<9.90true if price is 9.80
false if price is 8.50
modModulus (division remainder)5 mod 21

5. XPath Expressions

Let's try to retrieve different parts of XML using XPath expressions and given data types.

package xml;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;

import org.w3c.dom.Document;
import org.w3c.dom.NodeList;

public class XPathTest 
{
	public static void main(String[] args) throws Exception 
	{
		//Build DOM

		DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
		factory.setNamespaceAware(true); // never forget this!
		DocumentBuilder builder = factory.newDocumentBuilder();
		Document doc = builder.parse("inventory.xml");

		//Create XPath

		XPathFactory xpathfactory = XPathFactory.newInstance();
		XPath xpath = xpathfactory.newXPath();

		System.out.println("n//1) Get book titles written after 2001");

		// 1) Get book titles written after 2001
		XPathExpression expr = xpath.compile("//book[@year>2001]/title/text()");
		Object result = expr.evaluate(doc, XPathConstants.NODESET);
		NodeList nodes = (NodeList) result;
		for (int i = 0; i < nodes.getLength(); i++) {
			System.out.println(nodes.item(i).getNodeValue());
		}

		System.out.println("n//2) Get book titles written before 2001");

		// 2) Get book titles written before 2001
		expr = xpath.compile("//book[@year<2001]/title/text()");
		result = expr.evaluate(doc, XPathConstants.NODESET);
		nodes = (NodeList) result;
		for (int i = 0; i < nodes.getLength(); i++) {
			System.out.println(nodes.item(i).getNodeValue());
		}

		System.out.println("n//3) Get book titles cheaper than 8 dollars");

		// 3) Get book titles cheaper than 8 dollars
		expr = xpath.compile("//book[price<8]/title/text()");
		result = expr.evaluate(doc, XPathConstants.NODESET);
		nodes = (NodeList) result;
		for (int i = 0; i < nodes.getLength(); i++) {
			System.out.println(nodes.item(i).getNodeValue());
		}

		System.out.println("n//4) Get book titles costlier than 8 dollars");

		// 4) Get book titles costlier than 8 dollars
		expr = xpath.compile("//book[price>8]/title/text()");
		result = expr.evaluate(doc, XPathConstants.NODESET);
		nodes = (NodeList) result;
		for (int i = 0; i < nodes.getLength(); i++) {
			System.out.println(nodes.item(i).getNodeValue());
		}

		System.out.println("n//5) Get book titles added in first node");

		// 5) Get book titles added in first node
		expr = xpath.compile("//book[1]/title/text()");
		result = expr.evaluate(doc, XPathConstants.NODESET);
		nodes = (NodeList) result;
		for (int i = 0; i < nodes.getLength(); i++) {
			System.out.println(nodes.item(i).getNodeValue());
		}

		System.out.println("n//6) Get book title added in last node");

		// 6) Get book title added in last node
		expr = xpath.compile("//book[last()]/title/text()");
		result = expr.evaluate(doc, XPathConstants.NODESET);
		nodes = (NodeList) result;
		for (int i = 0; i < nodes.getLength(); i++) {
			System.out.println(nodes.item(i).getNodeValue());
		}

		System.out.println("n//7) Get all writers");

		// 7) Get all writers
		expr = xpath.compile("//book/author/text()");
		result = expr.evaluate(doc, XPathConstants.NODESET);
		nodes = (NodeList) result;
		for (int i = 0; i < nodes.getLength(); i++) {
			System.out.println(nodes.item(i).getNodeValue());
		}

		System.out.println("n//8) Count all books titles ");

		// 8) Count all books titles
		expr = xpath.compile("count(//book/title)");
		result = expr.evaluate(doc, XPathConstants.NUMBER);
		Double count = (Double) result;
		System.out.println(count.intValue());

		System.out.println("n//9) Get book titles with writer name start with Neal");

		// 9) Get book titles with writer name start with Neal
		expr = xpath.compile("//book[starts-with(author,'Neal')]");
		result = expr.evaluate(doc, XPathConstants.NODESET);
		nodes = (NodeList) result;
		for (int i = 0; i < nodes.getLength(); i++) {
			System.out.println(nodes.item(i)
								.getChildNodes()
								.item(1)				//node <title> is on first index
								.getTextContent());
		}

		System.out.println("n//10) Get book titles with writer name containing Niven");

		// 10) Get book titles with writer name containing Niven
		expr = xpath.compile("//book[contains(author,'Niven')]");
		result = expr.evaluate(doc, XPathConstants.NODESET);
		nodes = (NodeList) result;
		for (int i = 0; i < nodes.getLength(); i++) {
			System.out.println(nodes.item(i)
								.getChildNodes()
								.item(1)				//node <title> is on first index
								.getTextContent());
		}

		System.out.println("//11) Get book titles written by Neal Stephenson");

		// 11) Get book titles written by Neal Stephenson
		expr = xpath.compile("//book[author='Neal Stephenson']/title/text()");
		result = expr.evaluate(doc, XPathConstants.NODESET);
		nodes = (NodeList) result;
		for (int i = 0; i < nodes.getLength(); i++) {
			System.out.println(nodes.item(i).getNodeValue());
		}
		
		System.out.println("n//12) Get count of book titles written by Neal Stephenson");

		// 12) Get count of book titles written by Neal Stephenson
		expr = xpath.compile("count(//book[author='Neal Stephenson'])");
		result = expr.evaluate(doc, XPathConstants.NUMBER);
		count = (Double) result;
		System.out.println(count.intValue());

		System.out.println("n//13) Reading comment node ");

		// 13) Reading comment node
		expr = xpath.compile("//inventory/comment()");
		result = expr.evaluate(doc, XPathConstants.STRING);
		String comment = (String) result;
		System.out.println(comment);
	}
}

Program output:

//1) Get book titles written after 2001
Burning Tower

//2) Get book titles written before 2001
Snow Crash
Zodiac

//3) Get book titles cheaper than 8 dollars
Burning Tower
Zodiac

//4) Get book titles costlier than 8 dollars
Snow Crash

//5) Get book titles added in the first node
Snow Crash

//6) Get book title added in last node
Zodiac

//7) Get all writers
Neal Stephenson
Larry Niven
Jerry Pournelle
Neal Stephenson

//8) Count all books titles
3

//9) Get book titles with writer name start with Neal
Snow Crash
Zodiac

//10) Get book titles with writer name containing Niven
Burning Tower
//11) Get book titles written by Neal Stephenson
Snow Crash
Zodiac

//12) Get count of book titles written by Neal Stephenson
2

//13) Reading comment node
Test is test comment

I hope that this xpath tutorial has been informative for you. It will help you in executing xpath with Java. Above Java xpath example from string will successfully run in Java 8 as well.

If you have some suggestions then please leave a comment.

Happy Learning !!


Recommended Reading:

http://www.w3.org/TR/xpath-full-text-10-use-cases
http://en.wikipedia.org/wiki/XPath
http://oreilly.com/catalog/xmlnut/chapter/ch09.html

Was this post helpful?

Join 7000+ Fellow Programmers

Subscribe to get new post notifications, industry updates, best practices, and much more. Directly into your inbox, for free.

36 thoughts on “Java XPath Example – XPath Tutorial”

  1. I have one xml file i am reading some of data from xml but i have to use substring method in it.
    XML:

    Test1
    TEST2 Complete

    So as per above example i have wrote code in java and succesfully read TEST1 node value.
    But i want to read Test2 and need only Complete value. how can i use substring while parsing xml node using java
    Can please can you help us for provide correct xpath using substring method of xml.

    public static void main(String[] args){
            String xPath = "substring(./TEST2,7,8)";
            XPathFactory xpathf = XPathFactory.newInstance();
    	XPath xpath = xpathf.newXPath();
    	XPathExpression expr = xpath.compile(xPath);
            (NodeList) expr.evaluate(node, XPathConstants.NODESET)
    }
    

    Thanks!

    Reply
  2. Hi Lokesh,
    This post has been really helpful! Thank you!
    I had a question. Let’s say, if I want to query for a specific year and if inside that year, publisher=xyz, how will the expression look like?
    For ex, here is my XML. I am interested in querying for id=0 and ci_id=415bf14c9322c8c08dbbbb0c6c4d4425.

    Query_3
    4

    0
    license
    415bf14c9322c8c08dbbbb0c6c4d4425

    1
    interface
    4cfcf4e61f350bdba27593ebe64ecaf2

    1
    interface
    4e70e7cbd34836e6ba3b0f0aa7124ac8

    Thank you!

    Reply
  3. Hi, I’m really new in XPath, I would like to know if its possible to write one javax.xml.xpath.XPathExpression for obtain the value of tag?

     
    <?xml version="1.0" encoding="UTF-8" ?><autorizacion>
      <estado>AUTORIZADO</estado>
      <numeroAutorizacion>0803201801110302914400110010010000000641234567811</numeroAutorizacion>
      <ambiente>PRUEBAS</ambiente>
      <comprobante><![CDATA[<?xml version="1.0" encoding="UTF-8"?>
    <factura id="comprobante" version="1.0.0">
        <infoTributaria>
            <secuencial>000000064</secuencial>
        </infoTributaria>
        <infoFactura>
            <tipoIdentificacionComprador>07</tipoIdentificacionComprador>
            <identificacionComprador>9999999999999</identificacionComprador>
        </infoFactura>
    </factura>]]></comprobante>
      <mensajes/>
    </autorizacion>
    

    I have try these expressions, with no results:

    String expression= “/autorizacion/comprobante/*/*/factura/infoFactura/identificacionComprador”;
    String expression= “/*//identificacionComprador/text()”;

    The java function for compiling is this:

     
        public Object leerArchivo(String expression, QName returnType) {
            try {
                XPathExpression xPathExpression =  xPath.compile(expression);
                return xPathExpression.evaluate(xmlDocument, returnType);
            } catch (XPathExpressionException ex) {
                Logger.getLogger(LectorXPath.class.getName()).log(Level.SEVERE, null, ex);
                return null;
            }
        }
    
    

    Thanks in advance for any help or guidance.

    Reply
    • We can not read data inside CDATA using normal xpath expression. The data is plain text – not DOM node so we must be using string functions.

      This code works:

      XPathExpression expr = xpath.compile(&quot;substring-before(substring-after(/autorizacion/comprobante,&quot;&lt;identificacionComprador&gt;&quot;), &quot;&lt;/identificacionComprador&gt;&quot;)&quot;);
      Object result = (String) expr.evaluate(doc, XPathConstants.STRING);
      System.out.println(result);
      

      Read More : SO Thread

      Reply
  4. hi. i want to traverse the XML file given below and get the ip address and application name stored in array. Help me how to traverse.

    App1
    App2
    App3
    App4

    App12
    App22
    App32
    App42

    Desired output is:
    IP address:192.168.10.10
    Appliaction running on 192.168.10.10 device:
    App1
    App2
    App3
    App4

    Similarly for next device

    Reply
  5. Thanks for the clear explanation. . Its good.

    Can you let me know how to return the particular node as xml response back.

    From your example xml file, if i pass the year value as “2005” then it should send me the particular node as XML response.

    Reply
  6. <?xml version="1.0"?>
    <ArrayOfPurchaseEntitites xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema">
    <PurchaseEntitites>
    <rInstalmentAmt>634.0</rInstalmentAmt>
    <rAnnualRate>12.0</rAnnualRate>
    <rInterestAmt>2670.0</rInterestAmt>
    <dFirstInstalment>3/31/2016 12:00:00 AM</dFirstInstalment>
    <dLastInstalment>8/31/2018 12:00:00 AM</dLastInstalment>
    <rInsurancePremium>1350.0</rInsurancePremium>
    <sResponseCode>00</sResponseCode>
    </PurchaseEntitites>
    </ArrayOfPurchaseEntitites> 
    Reply
  7. Good Day Lokesh,

    your assistance would be highly appreciated.

    this is my xml document below :

    634.0
    12.0
    2670.0
    3/31/2016 12:00:00 AM
    8/31/2018 12:00:00 AM
    1350.0
    00

    i have tried so many ways to try and extract the inner text of the xml but no success, could you please provide a solution to read the above xml.

    thank you very much

    Reply
  8. hi lokesh i want to call web service with xpath included in url…could u help me about this that how can i pass
    only some part of xml in url..

    e.g my url is :: ‘localhost.localdomain’]/deviceconfig/system&element=myneme20.20.20.20

    Reply
  9. Hi my doubt is whether we can write a program to find the path of any given XML file.for example you take any XML file and find out all possible paths of the given file.

    Reply
  10. Thanks for wonderful article.Can you tell me the XPATH expression to select all nodes where’ year ‘ attribute is specified .I tried using //book[@year] but it doesnt work

    Reply
      • Thanks, I will try that. Crossing my fingers that the code shows. The the gist, how do you also get inline tags to show in the results? Suppose one of the examples had inlines, like this

            <book year=&quot;2005&quot;>
                <title><b>My </b>Burning Tower</title>
                <author>Larry Niven</author>
                <author>Jerry Pournelle</author>
                <publisher>Pocket</publisher>
                <isbn>0743416910</isbn>
                <price>5.99</price>
            </book>
        

        —edited—

        And you want the results to look like this

        //1) Get book titles written after 2001
        <b>My </b>Burning Tower
        

        —edited—

        at any rate, I think the intent of my question is clear?

        Thanks in advance!

        Reply
        • public class XPathTest {
              public static void main(String[] args) throws Exception {
                  DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
                  factory.setNamespaceAware(true); // never forget this!
                  DocumentBuilder builder = factory.newDocumentBuilder();
                  Document doc = builder.parse(&quot;sample.xml&quot;);
           
                  XPathFactory xpathfactory = XPathFactory.newInstance();
                  XPath xpath = xpathfactory.newXPath();
           
                  System.out.println(&quot;n//1) Get book titles written after 2000&quot;);
                  // 1) Get book titles written after 2001
                  XPathExpression expr = xpath.compile(&quot;//book[@year&gt;1999]/title/node()&quot;);
                  Object result = expr.evaluate(doc, XPathConstants.NODESET);
                  NodeList nodes = (NodeList) result;
                  for (int i = 0; i &lt; nodes.getLength(); i++) {
                  	Node node = nodes.item(i);
                  	StreamResult xmlOutput = new StreamResult(new StringWriter());
                  	Transformer transformer = TransformerFactory.newInstance().newTransformer();
                  	transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, &quot;yes&quot;);
                  	transformer.transform(new DOMSource(node), xmlOutput);
                  	String nodeAsAString = xmlOutput.getWriter().toString();
                  	System.out.println(nodeAsAString);
                  }
              }
          }
          
          Output:
          
          n//1) Get book titles written after 1999
          &lt;b&gt;Snow&lt;/b&gt;
          Crash
          Burning Tower
          

          So far, I am able to do this much. I know it does not exactly solve your problem, but be sure that I will put more effort onto this.

          Reply
  11. Very nice! But what about elements with nested elements? What if one of your examples had an inline element like this

    Burning MY Tower
    Larry Niven
    Jerry Pournelle
    Pocket
    0743416910
    5.99

    And you wanted output to include the inline like this

    //1) Get book titles written after 2001
    Burning MY Tower

    Can this be done without some complicated serialization? I’ve tired JAXB and gotten lost in complexity. Is there an easier way?

    Reply
  12. Thank you very much, very good.

    I noticed that it is a small step only to manipulation of XML using XPath expressions. I tried setTextContext after navigating to a node and it worked as expected!

    Reply
    • Hi,

      if you like, here is an excerpt from my code

      DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
      factory.setNamespaceAware(true);
      DocumentBuilder builder = factory.newDocumentBuilder();
      Document doc = builder.parse(new InputSource(new StringReader(xmlInput)));
      XPathFactory xpathfactory = XPathFactory.newInstance();
      XPath xpath = xpathfactory.newXPath();
      XPathExpression xpathExpression = xpath.compile(xpathDefinition);
      Node node = (Node) xpathExpression.evaluate(doc, XPathConstants.NODE);
      node.setTextContent(value);
      DOMSource domSource = new DOMSource(doc);
      StringWriter writer = new StringWriter();
      StreamResult res = new StreamResult(writer);
      TransformerFactory tf = TransformerFactory.newInstance();
      Transformer transformer = tf.newTransformer();
      transformer.transform(domSource, res);
      String xmlOutput = writer.toString();

      I had the need to transform the document to an XML string …

      Cheers
      Thorsten

      Reply

Leave a Comment

HowToDoInJava

A blog about Java and its related technologies, the best practices, algorithms, interview questions, scripting languages, and Python.