back to developerlife.com
DOM 1.0 implementation
anomalies

DOM 1.0 implementation anomalies
Author: Nazmul Idris
Date: March 27 1999.
Copyright Nazmul Idris. 1998-1999. All Rights Reserved. 
Index
Introduction
XML document used for testing
How IBM and Sun parsers deal with common XML document elements
List of anomalies
Downloading the parsers


go to topIntroduction
When we wrote DomView 1.0 using the 3 Java XML Parsers that we tested, we noticed that each parser implemented DOM in a different way for certain features. In fact, some parsers didn't even implement all the interfaces in the DOM 1.0 API. We compiled a list of these differences between the Sun Project X Technology Release 1 and IBM XML Parser for Java v2.0.4 (not in backward compatibility mode) parsers and these are listed below. Information about the OpenXML parser will be included here shortly.

These results are yielded using the new IBM Parser when not in backward-compatible mode (by using the new way of instantiating DOM document objects).

In order to read the older article comparing the Sun Project X Technology Release 1 and IBM XML Parser for Java v2.0.4 (in backward compatibility mode with version 1.x) parsers click here.

go to topXML document used for testing
We found these anomalies by using a sample XML document that used an internal DTD and external DTD. I am listing the version of the input file with the internal DTD below:

<?xml version='1.0'?> 
<!DOCTYPE addressbook [ 
<!ELEMENT addressbook (person)*> 
<!ELEMENT person (name, email)> 
<!ELEMENT name (#PCDATA)> 
<!ELEMENT email (#PCDATA)> 
<!ATTLIST person gender CDATA #IMPLIED> 
<!ATTLIST person location CDATA #IMPLIED> 
]> 
<addressbook> 

<!--beginning of document--> 

<person gender="Male" location="South"> 
  <name> 
    <![CDATA[ <<<John Doe>>>&lt; ]]> 
  </name> 
  <email>john@doe.com</email> 
</person> 

<person gender="Female"> 
  <name>Jane Doe</name> 
  <email>jane@doe.com</email> 
</person> 

<person> 
  <name>Mary Doe</name> 
  <email>mary@doe.com</email> 
</person> 
</addressbook> 

go to topHow IBM and Sun parsers deal with common XML document elements
The table below lists the ways the IBM and Sun parser deal with certain XML document elements.
XML document elements Sun Parser IBM Parser (not in backward compatibility mode)
Internal DTD It is not displayed by the Document object tree It is not displayed by the Document object tree
External DTD It is not displayed by the Document object tree It is not displayed by the Document object tree
Comment It is not displayed by the Document object tree The content of the comment is shown as a leaf in the Document object tree.
CDATA It is not implemented. This node is displayed as a Text node Properly displayed as a CDATASection node
Attribute The String representation of getAttributes() displays name and its value The String representation of getAttributes() is not implemented (NamedNodeMap implementation class does not have a toString() method)
Carriage Return/Line Feed Carriage return between elements is ignored Carriage return between elements is preserved as a Text node (with cr/lf in it)

go to topList of anomalies

The sections below outline the differences between the way the IBM and Sun parsers deal with certain XML document elements.

Attributes

Description
Both the Sun and IBM parsers work the same when it comes to extracting attributes from an Element node. However, the String representation of the NamedNodeMap is slightly different. A NamedNodeMap object is returned by calling getAttributes() on an Element node.

For example, for the following XML document fragment:
    <person gender="Male" 
     location="South">

The Sun implementaion returns the following for elementNode.getAttributes().toString():
    gender="Male" location="South"

The IBM implementation does not override the toString() method and calls the toString() method defined in the Object class which does not return anything meaningful. 

Analysis/Recommendation

Since there is not difference in extracting attribute from an Node, your code does not have to be different for the Sun and IBM parsers. However, don't rely on the toString() method on NameNodeMap to return a String in the same format on both parsers.

Comments

Description

The IBM parser displays comments as leaves in the Document object tree, but Sun parser ignores comments.

Analysis/Recommendation

Don't  write your programs to depend of information extracted from comments using the IBM parser, because moving to other parser may make break the program.

CDATA sections

Description

The Sun parser does not implement the CDATASection interface.  The Sun parser treats a CDATA section as a Text node. However, the IBM parser implements the CDATASection interface properly.

Analysis/Recommendation

Although the Sun parser does not implement the CDATASection interface and treats CDATA nodes as Text nodes, there is no difference in getting value from the node compare to IBM parser since CDATASection interface is empty and it extends TextNode interface. In the Sun parser, typecasting a CDATA node as a CDATASection node throws a class type cast exception.

Carriage return/Line feed (cr/lf) handling

Description

The IBM parser displays a cr/lf between elements as a text node on the Document object tree.  However the Sun parser ignores it.

Analysis/Recommendation

Don't write programs that depend on a specific child node index, since the number of child nodes returned by IBM and Sun parsers is different depending on whether there is a cr/lf between elements.

go to topDownloading the parsers
Please go to the tools section to actually download the Parsers themselves.

We hope that this list helps you in your DOM programming tasks, in keeping your projects DOM 1.0 compliant (instead of depending on the specific features of one parser). As we review more parsers, we will update this list as necessary. So stay tuned and keep coming back. Click here to me any feedback or comments.

go to top
Copyright © Nazmul Idris 1998-2006. All Rights Reserved.
          Last Updated: Mar. 18 1999.