XML and Java Tutorial, Part 1

Posted by

Table of contents

About the tutorial
Before you begin
Short introduction to XML

What is XML
What is DOM

Using a Java XML Parser to parse XML

How to create a DOM object from an XML document

How to use DOM to extract information from XML documents

DOM is a tree of Nodes
Description of Node, NodeList and Element interfaces
Getting text values from a NodeList

Using the Swing API with XML

How to make Swing models wrap around DOM object models

Using the Servlet API with XML

How to make XML information available to the Web

Downloading the source code and running the programs

References

About the tutorial

In this tutorial, we will create an XML document, the contents of which can be accessed using a JFC/Swing application and from a web browser (via a Servlet). The XML document is a very simplistic representation of an address book which stores the name, email and company name of people. The XML document is written manually, and its structure is known by the Swing application and the Servlet.

This tutorial shows how Java can be used to display information in XML documents using a graphical Swing interface and an HTML based interface.

This tutorial demonstrates the simple power of XML, the information is the most important element of the equation. Information rendering engines (user interfaces) can be swapped out as is appropriate for the display device. All these rendering engines all work on the same XML document. Also, XML Parsers from different vendors can be used without making any major changes to the source code, which is another feature of using a standards based information storage format.

About developerlife.com

The applications created in this tutorial are very simple, too simple for any use in the real world. Part 2 and Part 3 of the tutorial will introduce more real world examples. developerlife hosts tutorials on lots of XML technologies – SAX, DOM, Using Java to manipulate XML, database access using XML, parsing XML from web services, etc. You can find them all here. We have tutorials on Swing and graphics programming, multithreading in Java, as well as SOA related topics, JavaME/J2ME, and GWT content as well.

Before you begin

While you are reading the tutorial and trying out the examples, please use the links in the references section to understand terms that you are unfamiliar with. I am assuming that you are familiar with XML (and all its related terminology) before you start this tutorial.

Short Introduction to XML

What is XML?

XML is described very well in the following tutorial – “What is XML? An introduction“. XML uses markup tags as well, but, unlike HTML, XML tags describe the content, rather than the presentation of that content. By avoiding formatting tags in the data, but marking the meaning of the data itself with custom user definable tags, we actually make it easier to search various documents for a tag and view documents tailored to the preferences of the user. Using XML tags to define what your data means using the natural vocabulary of your data’s domain is the key motivation for XML’s invention and the basis of its usefulness. XML data can be rendered differently, depending on the destination device. The XML processor can exist on the server, the client, or both.

What is DOM?

This tutorial explains what Document Object Model (DOM) is – “Introduction to DOM 1.0 API“. DOM is a set of platform and language neutral interfaces that allow programs to access and modify the content and structure of XML documents. This specification defines a minimal set of interfaces for accessing and manipulating XML document objects. The Document Object Model is language neutral and platform independent. A DOM object is used to extract information from an XML document in a Java program (using a Java XML Parser). You can learn more about DOM in the Java and XML book.

Using a Java XML Parser to parse XML

You first need a well formed XML document and a validating XML parser in order to read information into your programs. JDOM is a good parser to use Java – you can download it here. You can use SAX to retrieve information from XML documents as well, but SAX is not used in this tutorial.

How to create a DOM object from an XML document

Java interfaces for DOM have been defined by the W3C and these are available in the org.w3c.dom package. The code that is required to instantiate a DOM object is different depending on which parser you use. Details on instantiating DOM objects using JDOM are provided here. You can learn more about JDOM in the Java and XML book.

You must have the a String URL of the XML document before you can instantiate a DOM object from it. Assume that an XML file called AddressBook.xml is at the following URL: https://developerlife.com/xmljavatutorial1/AddressBook.xml. Here is the code to create a DOM object from this URL:

 1: import org.jdom.JDOMException;
 2: import org.jdom.input.SAXBuilder;
 3: import java.io.IOException;
 4:
 5:
 6:
 7: try{
 8:    SAXBuilder parser = new SAXBuilder();
 9:    Document doc = parser.build(“http://developerlife.com/xmljavatutorial1/AddressBook.xml”);
 10:    // work with the document…
 11: }catch(Exception e){}

How to use DOM to extract information from XML documents

Now that the DOM (org.w3c.dom.Document) object has been creating using either parser, it is time to extract information from the document. Lets talk about the AddressBook.xml file. Here is the DTD for this XML file:

 1: <?xml version=“1.0??>
 2: <!DOCTYPE ADDRESSBOOK [
 3: <!ELEMENT ADDRESSBOOK (PERSON)*>
 4: <!ELEMENT PERSON (LASTNAME, FIRSTNAME, COMPANY, EMAIL)>
 5: <!ELEMENT LASTNAME (#PCDATA)>
 6: <!ELEMENT FIRSTNAME (#PCDATA)>
 7: <!ELEMENT COMPANY (#PCDATA)>
 8: <!ELEMENT EMAIL (#PCDATA)>
 9: ]>

Here is an illustration of this DTD:
DTD for AddressBook.xml

DOM creates a tree based (or hierarchical) object model from the XML document. The Document (created from an XML file) contains a tree of Nodes. Methods in the Node interface allow you to find out whether a Node has children or not, and also what the type of the Node is and what its value is (if any). There are many types of Nodes, but we are interested in the following types: TEXT_NODE (=3), ELEMENT_NODE(=1). These types are static int values which are defined in the org.w3c.dom.Node.java interface created by the W3C. So a Document object is a simple container of Nodes. But, in our DTD, we have Elements, not Nodes. It just so happens that there is an interface called Element (which extends Node). It also turns out that a Node which is of type ELEMENT_NODE is also an Element. Nodes of type ELEMENT_NODE (or Elements) can also have children. How do you access these children? Through the NodeList interface of course; the NodeList interface defines 2 methods to allow the iteration of a list of Nodes. These NodeList objects are generated by Node objects of type ELEMENT_NODE (or Element objects). The Document interface has a method called getElementsByTagName(String tagname ) which returns a NodeList of all the Elements with that tag name.

Need more help? developerlife.com offers training courses to empower you, and consulting services to enable you.

So here is how we can extract information from our Document object. We first ask the document object for all the Element objects that have the tag name “PERSON”. This should return all the Element objects that are PERSONs; all the Element objects with this tag name are returned in a NodeList object. We can use the getLength() method on this NodeList to determine how many PERSON elements are in the NodeList. Here is some code to do this:

 1: Document doc = … //create DOM from AddressBook.xml
 2: NodeList listOfPersons = doc.getElementsByTagName( “PERSON” );
 3: int numberOfPersons = listOfPersons.getLength();

Now that we have the NodeList object containing all the PERSON Elements (which are also Nodes), we can iterate it to extract information from each PERSON Element (Node). The method item(int index) in NodeList returns a Node object. Remember that when the type of a Node is ELEMENT_TYPE, it is actually an Element. So here is the code to get the first person from our NodeList (assuming there is at least one person in the AddressBook.xml file):

 1: if (numberOfPersons > 0 ){
 2:     Node firstPersonNode = listOfPersons.item( 0 );
 3:     if( firstPersonNode.getNodeType() == Node.ELEMENT_NODE ){
 4:        Element firstPersonElement = (Element)firstPersonNode;
 5:     }
 6: }

Now we have a reference to the firstPersonElement, which we can use to find out the FIRSTNAME, LASTNAME, COMPANY and EMAIL information of this PERSON element. Since the firstPersonElement is an Element, we can use getElementsByTagName(String) again to get the FIRSTNAME, LASTNAME, COMPANY and EMAIL elements in it. Here is the code to do get the FIRSTNAME element of the firstPersonElement:

 1: NodeList firstNameList =
 2:     firstPersonElement.getElementsByTagName( “FIRSTNAME” );
 3: Element firstNameElement =
 4:     firstNameList.item( 0 );

The firstNameElement contains a list of TEXT_NODEs (one of which is the value of the first name of this person). So the firstNameElement must be asked to return a list of TEXT_NODEs that it contains (in order to get the text which is the first name of this person). Here is the code to get a list of TEXT_NODEs contained in this firstNameElement:

 1: NodeList list = firstNameElement.getChildNodes();

Along with the text (which is the first name of the person), this NodeList (list) may contain other Nodes (of type TEXT_NODE); this text is useless to us, because it consists of whitespace and carriage return and line feeds (crlf). This is NOT intuitive, because we expect only the name of the person to be in the NodeList, instead there are a bunch of nodes in this NodeList which contain a whitespace, crlfs and the String that we really want. So how do we extract the first name from this mess? We have to iterate the NodeList, and ask each Node in it for its value by using the getNodeValue() method. Then we have to trim() the String value and make sure that it is not “” or “r”. When we have found a value that is not whitespace or crlf, then we can assume that it is the first name of the person. Here is the code to do this parsing:

 1: String firstName = null;
 2: for (int i = 0 ; i < list.getLength() ; i ++ ){
 3:     String value =
 4:         ((Node)list.item( i )).getNodeValue().trim();
 5:     if( value.equals(“”) || value.equals(“r”) ){
 6:         continue; //keep iterating
 7:     }
 8:     else{
 9:         firstName = value;
 10:         break; //found the firstName!
 11:     }
 12: }

Now, this procedure must be repeated on firstPersonElement for the LASTNAME, COMPANY and EMAIL elements. You might consider putting this parsing of the NodeList to get a text value in a utility method (in an XML utility class that you can write).

The diagram below presents a visual representation of all the code that we have gone over so far to get information out of a Document:

This was the hard part, integrating this information with a TableModel and making it available to a Servlet is relatively easy!

Using the Swing API with XML

In order to render information in an XML document to a Swing JComponent, it is necessary to build a custom model which accesses the underlying information in a Document. It is important to remember that in Swing, the models are interfaces which allow access to the underlying data, they dont have to contain the data themselves. This is why all the models are Java interfaces, like TableModel, TreeModel and ListModel.

In this tutorial, we will create a custom TreeModel around the AddressBook.xml and then display the data in a JTable. This information will not be editable right now (that’s in the next part of the tutorial). You can learn more about Swing tables in the Java Swing book.

How to make Swing models wrap around DOM objects

The first step is to create the XML file, then write some code to create a Document object and parse it to get information out of it. Then, a custom TableModel must be created around this code, to allow a JTable access this information. Below is a partial listing of AddressBookMode.java class which implements the TableModel interface and allows access to data in the AddressBook.xml file:

 1: public class AddressBookModel implements TableModel{
 2:
 3:
 4:
 5: //CONSTANTS
 6: public static final String URL =
 7:     “http://beanfactory.com/xml/AddressBook.xml”;
 8:
 9: //TABLE META DATA
 10: public static final String ROOT_ELEMENT_TAG = “PERSON”;
 11: public static final String[] colNames ={
 12:     “LASTNAME”,
 13:     “FIRSTNAME”,
 14:     “COMPANY”,
 15:     “EMAIL”
 16: };
 17:
 18:
 19:
 20: //DATA
 21: protected Document doc;
 22:
 23:
 24:
 25: //TableModel Implementation
 26: /**
 27:  Return the number of persons in an XML document
 28: */
 29: public int getRowCount() {
 30:     return XmlUtils.getSize( doc , ROOT_ELEMENT_TAG );
 31: }
 32:
 33: /**
 34: Return the value of text at the specified r (row) and 
 35: c (col) location in the table. The row and col information
 36: is translated in the Document, to get the rth person from 
 37: the Document, and then get the element value of the tag 
 38: that by the name of colNames[ c ]. This is the main “trick” 
 39: in this entire class.
 40: */
 41: public Object getValueAt(int r, int c) {
 42:      //must get row first
 43:      Element row =
 44:         XmlUtils.getElement( doc , ROOT_ELEMENT_TAG , r );
 45:      //must get value for column in this row
 46:      return XmlUtils.getValue( row , colNames[c] );
 47: }
 48:
 49:
 50:
 51: }//end class

The two interesting methods in this TableModel are shown above. Note that a static String array is used to hold all the column names for this TableModel, which just happen to be the names of the Elements in a PERSON Element. This is by design, since the structure of the AddressBook.xml document is basically a 2 dimensional array, with rows and columns. The column names are known already since we made the DTD. This is why all the column names are stored in a static array called colNames. In order to access each cell in the TableModel a row and column identifier is needed. But, information in the Document is not stored by rows and columns, so we have to write a simple translation code. Every row in the TableModel is actually a PERSON Element in the Document. Now, every PERSON element has a FIRSTNAME, LASTNAME, COMPANY and EMAIL information, which is represented by the column identifier for the given row (or PERSON). In the getValue( int r, int c ) method above, this trick is used to information from the Document. Notice that before accessing Elements inside a PERSON, the column integer is converted to a column name (by using the colNames[]). The XmlUtils.java class is very simple and is written to work only for this example, you can write a generic one for different types of data, 1D array, 2D array, etc.

Here is a partial listing of the XmlUtils.java class:

 1: public class XmlUtils{
 2:
 3:
 4:
 5: /**
 6:  Return an Element given a Document, tag name, and index
 7:  */
 8: public static Element getElement(Document doc , String tagName , int index ){
 9:     //given an XML document and a tag
 10:     //return an Element at a given index
 11:     NodeList rows =
 12:         doc.getDocumentElement().getElementsByTagName(tagName);
 13:     return (Element)rows.item( index );
 14: }
 15:
 16: /**
 17:  Return the number of person in the Document
 18: */
 19: public static int getSize( Document doc , String tagName ){
 20:     NodeList rows =
 21:         doc.getDocumentElement().getElementsByTagName(
 22:             tagName );
 23:     return rows.getLength();
 24: }
 25:
 26: /**
 27:  Given a person element, must get the element specified
 28:  by the tagName, then must traverse that Node to get the
 29:  value.
 30:  Step1) get Element of name tagName from e
 31:  Step2) cast element to Node and then traverse it for its
 32:  non-whitespace, cr/lf value.
 33:  Step3) return it!
 34:  NOTE: Element is a subclass of Node
 35:
 36:  @param e an Element
 37:  @param tagName a tag name
 38:  @return s the value of a Node 
 39: */
 40:
 41: public static String getValue( Element e , String tagName ){
 42:   try{
 43:      //get node lists of a tag name from a Element
 44:      NodeList elements = e.getElementsByTagName( tagName );
 45:      Node node = elements.item( 0 );
 46:      NodeList nodes = node.getChildNodes();
 47:
 48:      //find a value whose value is non-whitespace
 49:      String s;
 50:      for( int i=0; i<nodes.getLength(); i++){
 51:          s = ((Node)nodes.item( i )).getNodeValue().trim();
 52:          if(s.equals(“”) || s.equals(“r”)) {
 53:              continue;
 54:          }
 55:          else return s;
 56:      }
 57: }
 58: catch(Exception ex){}
 59:   return null;
 60: }
 61:
 62:
 63: }//end class

The getElement() method is very straighforward, it is used to get the PERSON Element in the row index of the TableModel. Once you have a PERSON Element, you need to extract information from its column indexes. This is where the getValue() method comes in. Given a PERSON Element, it translates column names into values for that column.

That’s it! You now know how to extract information from XML Documents and present them in Swing components. In the source code provided, you have to run the AddressBookFrame class in order to see the JTable with the XML information in it. You must also be connected to the Internet at the time, since the AddressBook.xml file is downloaded from the beanfactory.com webserver. If you are not connected to the Internet at the time your run this program, it will appear to hangup (it waits for an Internet connection to be made).

Here are two screen shots of the AddressBookFrame class in action, one for the Sun parser and one for the IBM parser:
Two versions of AddressBookFrame running

Using the Servlet API with XML

Displaying this XML Document using a Servlet is very similar to creating a TableModel around it. The init() method of the Servlet can create a Document object from an XML file. Then the doGet() method simply has to extract all the values for every column of every row in the Document (using the XmlUtil class) and display a HTML table from it. You can learn more about Servlets in the Java Servlet Programming book.

How to make XML information available to the Web

The following is a listing of the AddressBookServlet.java (for the IBM Parser) which displays this information to a browser. Most of the code here generates the HTML. The actual Document parsing is the same as in the TableModel.

 1: public class AddressBookServlet extends HttpServlet {
 2: //CONSTANTS
 3: public static final String
 4:    URL = “http://beanfactory.com/xml/AddressBook.xml”;
 5: public static final String ROOT_ELEMENT_TAG = “PERSON”;
 6:
 7: public static final String[] colNames ={
 8:      “LASTNAME”,
 9:      “FIRSTNAME”,
 10:      “COMPANY”,
 11:      “EMAIL”
 12: };
 13:
 14: //DATA
 15: protected Document doc;
 16:
 17: /**
 18:  When this method receives a request from a browser, it 
 19:  returns a Document in table format.
 20:  @param req http servlet request
 21:  @param res http servlet response
 22:  @exception ServletException
 23:  @exception IOException
 24:  */
 25:
 26: protected void doGet(HttpServletRequest req, HttpServletResponse res)
 27: throws ServletException, IOException {
 28:     res.setContentType(“text/html”);
 29:     PrintWriter out = new PrintWriter(res.getOutputStream());
 30:
 31:     out.print( “<html>” );
 32:     out.print( “<title>” );
 33:     out.print( “XML and Java2 Tutorial Part 1: IBM Parser” );
 34:     out.print( “</title>” );
 35:     out.print( “<center>” );
 36:     out.print( “<head><pre>” );
 37:     out.print( “http://beanfactory.com/xml/AddressBook.xml” );
 38:     out.print( “</pre></head><hr>” );
 39:     //format the table
 40:     out.print( “<table BORDER=0 CELLSPACING=2 “ );
 41:     out.print( “CELLPADDING=10 BGCOLOR=”#CFCFFF” >” );
 42:     out.print( “<tr>”);
 43:
 44:     //display table column
 45:     for(int i=0; i<colNames.length; i++){
 46:         out.print( “<td><b><center>” +
 47:                     colNames[i] +
 48:                     “</center></b></td>” );
 49:     }
 50:
 51:     out.print( “</tr>” );
 52:
 53:     //need to iterate the doc to get the fields in it
 54:     int rowCount = XmlUtils.getSize( doc , ROOT_ELEMENT_TAG );
 55:       for(int r=0; r<rowCount; r++) {
 56:         out.print( “<tr>” );
 57:         Element row = XmlUtils.getElement(
 58:             doc , ROOT_ELEMENT_TAG , r );
 59:         int colCount = colNames.length;
 60:
 61:         for(int c=0; c < colCount; c++) {
 62:             out.print( “<td>” );
 63:             out.print( XmlUtils.getValue( row , colNames[c] ));
 64:             out.print( “</td>” );
 65:         }//end for c=0…
 66:
 67:         out.print( “</tr>” );
 68:
 69:      }//end for r=0…
 70:
 71:      out.print( “</table>” );
 72:      out.print( “<hr>Copyright The Bean Factory, LLC.” );
 73:      out.print( ” 1998-1999. All Rights Reserved.”);
 74:      out.print( “</center>” );
 75:      out.println(“</body>”);
 76:      out.println(“</html>”);
 77:      out.flush();
 78:      out.close();
 79: }//end method
 80:
 81: /**
 82:  Create a DOM from an XML document when the servlet starts up.
 83:
 84:  @param config servlet configuration
 85:  @exception ServletException
 86: */
 87:
 88: public void init(ServletConfig config ) throws ServletException{
 89:     super.init( config );
 90:
 91:     //load the Document
 92:     try{
 93:         //create xml document
 94:         URL u = new URL(URL);
 95:         InputStream i = u.openStream();
 96:         Parser p = new Parser(“myParser”);
 97:         doc = p.readStream(i);
 98:      }catch(Exception e){
 99:        System.out.println( e );
 100:      }
 101: }
 102:
 103: /**
 104:  Return servlet information
 105:
 106:  @return message about this servlet
 107: */
 108: public String getServletInfo(){
 109:     return “Copyright The Bean Factory, LLC. 1998.” +
 110:            “All Rights Reserved.”;
 111: }
 112: }//end class

To run this servlet, place the AddressBookServlet.class and XmlUtils.class in the servlet class folder of your servlet engine and access AddressBookServlet using a web browser.

Downloading the source code and running the programs

The same sets of Java classes are provided for the Sun Parser and IBM Parser. Here is a description of these source code files:

AddressBookMode.java Contains the TableModel
XmlUtils.java Needed by AddressBookModel.java and AddressBookServlet.java
AddressBookServlet.java Contains the servlet
AddressBookFrame.java Displays a JTable in a JFrame
AddressBookPanel.java Needed by AddressBookFrame.java

To run the Swing JTable program, you can type “java AddressBookFrame” at the command prompt. Make sure that you have Java2 (or JDK1.2) installed on your machine because it will not work with JDK1.1. Also, please make sure that you are connected to the Internet at the time because the programs need to access AddressBook.xml from the website.

To run the Servlet, place the AddressBookServlet.class and XmlUtils.class in your servlet folder, and start your servlet engine. You can access the Servlet from your web browser by using a URL like eg: http://localhost/servlet/AddressBookServlet.

Here is older code using an old Sun parser – sun.zip (it should be pretty easy to swap the Sun parser instantiation with the JDOM parser).

I hope you enjoyed this tutorial, I have lots more XML and Java related tutorials on this site. Please scroll down to the “Further reading in related categories” section below to find links to all of them. Need more help? developerlife.com offers training courses to empower you, and consulting services to enable you.

References

JDOM

You can download JDOM here.

Details on instantiating DOM objects using JDOM are provided here. You can learn more about JDOM in the Java and XML book.

Sun’s XML Site

This is a good site to learn about Sun’s parser and also about how to do elementary things like use SAX and DOM. It also has a nice glossary of terms. After visiting this site and learning everything on it, the content on developerlife.com (which involves real-world application of this technology) will make more sense :).

Extensible Markup Language (XML) 1.0 W3C Recommendation

This is not useful for parsing or generating XML documents with Java. This specification lists all the rules that apply to a well formed XML document. This document is only good for specific rules about the syntax of XML documents.

Document Object Model IDL Documentation

This is very useful for parsing and generating XML documents using Java. All these interfaces are available in Java (for both the Sun and IBM parsers). The documentation for this IDL was used to create the documentation for the IBM and Sun parser’s source code. This documentation is more detailed than the javadoc generated documentation for the Java interfaces in the org.w3c.dom package.

Document Object Model IDL Definitions

This is not useful for parsing or generating XML documents with Java. This specification lists all the rules that apply to a well formed XML document. This document is only good for specific rules about the syntax of XML documents.

XML, Java and the future of the web

Jon Bosak, who is on the chair of the W3C XML Working Group, does a great job of describing what kinds of products and services can be made possible from the union of Java and XML.

SAX 1.0: The Simple API for XML

This is David Megginson’s website documenting the SAX 1.0 API. It is short and to the point. If you want to learn about SAX 1.0 I recommend reading my SAX Tutorial though :).