Java Read XML with StAX Parser – Cursor & Iterator APIs

Learn to parse and read XML file using Java StAX parser. StAX (Streaming API for XML) provides two ways to parse XML i.e. cursor based API and iterator based API.

1) StAX Parser

Just like SAX parser, StAX API is designed for parsing XML streams. The difference is:

  1. StAX is a “pull” API. SAX is a “push” API.
  2. StAX can do both XML reading and writing. SAX can only do XML reading.

StAX is a pull style API. This means that you have to move the StAX parser from item to item in the XML file yourself, just like you do with a standard Iterator or JDBC ResultSet. You can then access the XML information via the StAX parser for each such “item” encountered in the XML file.

Cursor vs Iterator

  1. While reading the XML document, the iterator reader returns an XML event object from it’s nextEvent() calls. This event provides information about what type of XML tag (element, text, comment etc) your have encountered. The event received is immutable so you can pass around your application to processs it safely.
    XMLEventReader reader = ...;
    
    while(reader.hasNext()){
        XMLEvent event = reader.nextEvent();
    
        if(event.getEventType() == XMLEvent.START_ELEMENT){
            //process data
        }	
        //... more event types handled here...
    }
    
  2. Unlike Iterator, cursor works like Resultset in JDBC. If moves the cursor to next element in XML document. You can then call methods directly on the cursor to obtain more information about the current event.
    XMLStreamReader streamReader = ...;
    
    while(streamReader.hasNext()){
        int eventType = streamReader.next();
    
        if(eventType == XMLStreamReader.START_ELEMENT){
            System.out.println(streamReader.getLocalName());
        }
    
        //... more event types handled here...
    }
    

2) StAX Iterator API Example

Given below demonstrate how to use StAX iterator based API to read the XML document to object.

XML file

<employees>
	<employee id="101">
		 <name>Lokesh Gupta</name>
	    <title>Author</title>
	</employee>
	<employee id="102">
		 <name>Brian Lara</name>
	    <title>Cricketer</title>
	</employee>
</employees>

Read XML with StAX Iterator

To read the file, I have written the program in these steps:

  1. Create iterator and start receiving events.
  2. As soon as you get open 'employee' tag – create new Employee object.
  3. Read id attribute from employee tag and set to current Employee object.
  4. Iterate to next start tag events. These are XML elements inside employee tag. Read data inside these tags. Set read data to current Employee object.
  5. Continue iterating the event. When you find end element event for 'employee' tag, you can say that you have read the data for current employee, so add the current employee object to employeeList collection.
  6. At last, verify the read data by printing the employeeList.
package com.howtodoinjava.demo.stax;

import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

import javax.xml.namespace.QName;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.events.Attribute;
import javax.xml.stream.events.Characters;
import javax.xml.stream.events.EndElement;
import javax.xml.stream.events.StartElement;
import javax.xml.stream.events.XMLEvent;

public class ReadXMLExample 
{
	public static void main(String[] args) throws FileNotFoundException, XMLStreamException 
	{
		 File file = new File("employees.xml");
		 
		// Instance of the class which helps on reading tags
	    XMLInputFactory factory = XMLInputFactory.newInstance();
	 
        // Initializing the handler to access the tags in the XML file
        XMLEventReader eventReader = factory.createXMLEventReader(new FileReader(file));
        
        //All read employees objects will be added to this list
        List<Employee> employeeList = new ArrayList<>();
        
        //Create Employee object. It will get all the data using setter methods.
        //And at last, it will stored in above 'employeeList' 
        Employee employee = null;
        
        // Checking the availability of the next tag
        while (eventReader.hasNext())
        {
        	XMLEvent xmlEvent = eventReader.nextEvent();
        	
        	if (xmlEvent.isStartElement())
        	{
        		StartElement startElement = xmlEvent.asStartElement();
        		
        		//As soo as employee tag is opened, create new Employee object
        		if("employee".equalsIgnoreCase(startElement.getName().getLocalPart())) {
        			employee = new Employee();	
        		}
        		
        		//Read all attributes when start tag is being read
        		@SuppressWarnings("unchecked")
				Iterator<Attribute> iterator = startElement.getAttributes();
        		
                while (iterator.hasNext())
                {
                    Attribute attribute = iterator.next();
                    QName name = attribute.getName();
                    if("id".equalsIgnoreCase(name.getLocalPart())) {
                    	employee.setId(Integer.valueOf(attribute.getValue()));
                    }
                }
        		
                //Now everytime content tags are found; 
                //Move the iterator and read data
        		switch (startElement.getName().getLocalPart()) 
        		{
	        		case "name":
	        			Characters nameDataEvent = (Characters) eventReader.nextEvent();
	        			employee.setName(nameDataEvent.getData());
	        			break;
	        			
	        		case "title":
	        			Characters titleDataEvent = (Characters) eventReader.nextEvent();
	        			employee.setTitle(titleDataEvent.getData());
	        			break;
        		}
        	}
        	
        	if (xmlEvent.isEndElement())
        	{
        		EndElement endElement = xmlEvent.asEndElement();
        		
        		//If employee tag is closed then add the employee object to list; 
        		//and be ready to read next employee data
        		if("employee".equalsIgnoreCase(endElement.getName().getLocalPart())) {
        			employeeList.add(employee);
        		}
        	}
        }
        
        System.out.println(employeeList);	//Verify read data
        
	}
}

//Output:

[Employee [id=101, name=Lokesh Gupta, title=Author], 
 Employee [id=102, name=Brian Lara,   title=Cricketer]]

3) StAX Cursor API Example

I will read the same employees.xml file – now with cursor based API.

package com.howtodoinjava.demo.stax;

import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.List;

import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamReader;

public class ReadXMLExample 
{
	public static void main(String[] args) throws FileNotFoundException, XMLStreamException 
	{
		//All read employees objects will be added to this list
        List<Employee> employeeList = new ArrayList<>();
        
        //Create Employee object. It will get all the data using setter methods.
        //And at last, it will stored in above 'employeeList' 
        Employee employee = null;
		 
        File file = new File("employees.xml");
	    XMLInputFactory factory = XMLInputFactory.newInstance();
	    XMLStreamReader streamReader = factory.createXMLStreamReader(new FileReader(file));
	    
	    
	    while(streamReader.hasNext())
	    {
	    	//Move to next event
	        streamReader.next();
	        
	        //Check if its 'START_ELEMENT'
	        if(streamReader.getEventType() == XMLStreamReader.START_ELEMENT)
	        {
	        	//employee tag - opened
	            if(streamReader.getLocalName().equalsIgnoreCase("employee")) {
	            	
	            	//Create new employee object asap tag is open
	            	employee = new Employee();	
	            	
	            	//Read attributes within employee tag
	            	if(streamReader.getAttributeCount() > 0) {
	            		String id = streamReader.getAttributeValue(null,"id");
	            		employee.setId(Integer.valueOf(id));
	            	}
	            }
	            
	            //Read name data
	            if(streamReader.getLocalName().equalsIgnoreCase("name")) {
	            	employee.setName(streamReader.getElementText());
	            }
	            
	          //Read title data
	            if(streamReader.getLocalName().equalsIgnoreCase("title")) {
	            	employee.setTitle(streamReader.getElementText());
	            }
	        }
	        
	        //If employee tag is closed then add the employee object to list
	        if(streamReader.getEventType() == XMLStreamReader.END_ELEMENT)
	        {
	        	if(streamReader.getLocalName().equalsIgnoreCase("employee")) {
	        		employeeList.add(employee);
	        	}
	        }
	    }
        //Verify read data
        System.out.println(employeeList);
	}
}

//Output:

[Employee [id=101, name=Lokesh Gupta, title=Author], 
 Employee [id=102, name=Brian Lara,   title=Cricketer]]

4) Summary

So in this StAX parser tutorial, we learned following things:

  1. What is StAX parser based on XML streaming API.
  2. Difference between StAX vs SAX parsers.
  3. How to read XML with StAX iterator API with example.
  4. How to read XML with StAX cursor API with example.

Both API are capable of parsing any kind of XML document but the cursor API is more memory-efficient than the iterator API. So, if your application needs better performance, consider using the cursor based API.

Drop me your questions in comments section.

Happy Learning !!

Was this post helpful?

Join 7000+ Fellow Programmers

Subscribe to get new post notifications, industry updates, best practices, and much more. Directly into your inbox, for free.

Leave a Comment

HowToDoInJava

A blog about Java and its related technologies, the best practices, algorithms, interview questions, scripting languages, and Python.