Archive

Posts Tagged ‘google’

How to read data from xml file on Google App Engine – Part 2

This post is in continuation to the part-1 post which can be read here.

Now we shall look at the actual parser for parsing the Data.xml file.

I have used the basic SAXParser from javax.xml.parsers and DefaultHandler from org.xml.sax.helpers. The SAXParser requires two parameters, first the xml file to be parsed and second the DefaultHandler to handle events such as document start/end, element start/end etc.

The DefaultHandler has differnt methods for the events listed above such as :

public void startDocument () throws SAXException {
	// no op
}
public void endDocument() throws SAXException {
	// no op
}

and others..

We shall override these methods our CustomXMLParser class that extends the DefaultHandler class so that we can do the specific jobs that we want.

We define the following members in the CustomXMLParser class

private String characters;
private Country tempCountry;
private String tempCountryName;
private String tempCapitalName;

//the persistence manager for saving/retrieving the data from the datastore
private PersistenceManager pm = null;

private Stack xmlObjectStack;

The characters string is used to store the vaue between the end of one tag and start of another i.e. between “>” and “<“.

The tempCountry variable stores the country object temporarirly till it gets persisted to the datastore. We shall read the different values that we require from the xml and then populate the tempCountry object with these values. After all the reuired values have beeen read (i.e. at the end of </country> tag) we shall persist this tempCountry object to the datastore.

NOTE : Please note that the class below has been written for only illustration of the basic concept and hence is in no way adhering to the best practices.

CustomXMLParser class

package com.shank.xml;

import java.io.File;
import java.io.IOException;
import java.util.Stack;
import java.util.logging.Logger;

import javax.jdo.PersistenceManager;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

import com.shank.xml.demo.jdo.Country;
import com.shank.xml.demo.jdo.DatastoreUtilities;
import com.shank.xml.demo.jdo.PMF;

/**
 *
 * @author shank
 *
 */
public class CustomXMLParser extends DefaultHandler {

	//the logger for this class
	private Logger logger = Logger.getLogger(this.getClass().getName());

	private String characters;
	private Country tempCountry;
	private String tempCountryName;
	private String tempCapitalName;

	//the persistence manager for saving/retrieving the data from the datastore
	private PersistenceManager pm = null;

	private Stack xmlObjectStack;

	//constructor
	public CustomXMLParser(){
		//just initialize the stack
		xmlObjectStack = new Stack();
	}

	//the method to parse the data from Data.xml
	public void doParsing() {

		logger.info("inside doParsing()");

		if(DatastoreUtilities.checkIfEntityExists("India", Country.class)){

			logger.info("data is already present in the datastore and hence skipping the xml parsing");

		}else{
			//if the control reaches here that means that the data has not been populated in a previous request
			try {

				//get a SAXParser instance which will actually read the data from the xml
				SAXParserFactory factory = SAXParserFactory.newInstance();
				SAXParser parser = factory.newSAXParser();

				logger.info("parser created");

				parser.parse(new File("Data.xml"), this);

				logger.info("parsing done");

			} catch (ParserConfigurationException e) {

				logger.severe("!! ParserConfigurationException !!");

			} catch (SAXException e) {

				logger.severe("!! SAXException !!");

			} catch (IOException e) {

				logger.severe("!! IOException !!");
			}
		}

	}

	public void startDocument() throws SAXException {
		logger.info("inside startDocument");
	}

	public void endDocument() throws SAXException {
		logger.info("inside endDocument");
	}

	public void startElement(String uri, String localName, String qName,
			Attributes attributes) throws SAXException {

		//log the name of the element which has just started
		logger.info("inside startElement : element is "+qName);

		//check which element has started
		//we are only bothered about <country> and <capital>
		if(qName.equals("country")){
			//if control reaches here then <country> tag must have started

			//get the country name from the attributes and create a country by that name
			tempCountryName = attributes.getValue("name");
			tempCountry = new Country(tempCountryName);

			//push this country object to the stack
			xmlObjectStack.push(tempCountry);

		}else if(qName.equals("capital")){
			//if control reaches here then <capital> tag must have started

			//we do not need to do anything here as the name of the capital is between <capital> and </capital> tags

			//after here, the capital name is read in the characters() method below

		}

	}

	public void endElement(String uri, String localName, String qName)
			throws SAXException {
		//log the name of the element which has just ended
		logger.info("inside endElement : element is "+qName);

		//get the persistence manager instance
		pm = PMF.get().getPersistenceManager();

		//check which element has ended
		//we are only bothered about <country> and <capital>
		if(qName.equals("country")){
			//if control reaches here then <country> tag must have ended

			//pop the country object from the stack
			tempCountry = (Country)xmlObjectStack.pop();

			//this country object has already been populated fully since the country tag ends only after the capital tag has ended
			pm.makePersistent(tempCountry);

		}else if(qName.equals("capital")){
			//if control reaches here then <capital> tag must have ended

			//since the end of the capital tag has been reached, the name of the capital is already abailable in the characters variable
			tempCapitalName = characters;

			//since capital is nested within country, the country object is already present on the stack
			//get the country object from the stack and set its capital name
			tempCountry = (Country)xmlObjectStack.pop();
			tempCountry.setCapitalName(tempCapitalName);

			//push this country object back into the stack
			xmlObjectStack.push(tempCountry);
		}

		pm.close();
		pm = null;
	}

	public void characters(char ch[], int start, int length)
			throws SAXException {
		logger.info("inside characters");

		characters = new String(ch, start, length);

	}

}

Let us dissect the code of the CustomXMLParser and see how the above class is used to attain our ultimate aim.

For reference, I am listing below the Data.xml file that was already listed in the part-1 of this post.

<?xml version="1.0" encoding="utf-8"?>
<data>
	<countries>
		<country name="India">
			<capital>New Delhi</capital>
		</country>
		<country name="Japan">
			<capital>Tokyo</capital>
		</country>
	</countries>
</data>

The methods in the CustomXMLParser class that matter to us are :
startElement(), endElement() and characters()

startElement()
This method, as the name specifies, is called when the start of a tag is encountered. For instance, when the tag <country name=”India”> is encountered, this method is called and the name of the tag, i.e. “country” is passed in the qName variable and the attribute called “name” whose value is “India” is stored in an Attributes object and passed in the attributes variable. By checking the values of qName and attributes we can find out which tag we are starting with. Since we are starting with a country tag, we create a Country object and push it to the xmlObjectStack.

Using a stack helps us when we reach the end of the </capital> tag. Since the <>capital> tag is nested in the <country> tag, hence if we have reached a </capital> tag, we can be sure that we have already read the <country> tag and that the country object is available on the xmlObjectStack. We pop this country object from the stack and set its capital property to the value that is available in the “characters” string (The value of this characters string had been set by the characters() which is ready the string between > and < i.e. between the end of one tag and the start of another. Since we have reached the </capital> tag, the value of the capital (the name of the captial) is already available in the characters string. After setting the capital property of the country, we push it back into the xmlObjectStack.

When we encounter the </country> tag, we just pop the country object on the top of the xmlObjectStack and persist it to the datastore.

We repeat the same for all the countries.

Through this post I have tried to explain the basic process that we can use to read information from an xml file and do something useful from it.

Hope this helps you. And if it does, please dont forget to share it.