back to developerlife.com
DOM 1.0 implementation
anomalies
DOM 1.0 implementation anomalies
Author: Nazmul Idris
Date: March 18 1999.
Copyright The Bean Factory, LLC. 1998-1999. All Rights Reserved. 
Index
Introduction
XML document used for testing
How IBM and Sun parsers deal with common XML document elements
List of anomalies
Downloading the parsers


go to topIntroduction
When we wrote DomView 1.0 using the 3 Java XML Parsers that we tested, we noticed that each parser implemented DOM in a different way for certain features. In fact, some parsers didn't even implement all the interfaces in the DOM 1.0 API. We compiled a list of these differences between the Sun Project X Technology Release 1 and IBM XML Parser for Java v2.0.4 parsers and these are listed below. The OpenXML parser has not been analyzed in this way due to its (slow) performance issues.
go to topXML document used for testing
We found these anomalies by using a sample XML document that used an internal DTD and external DTD. I am listing the version of the input file with the internal DTD below:

<?xml version='1.0'?> 
<!DOCTYPE addressbook [ 
<!ELEMENT addressbook (person)*> 
<!ELEMENT person (name, email)> 
<!ELEMENT name (#PCDATA)> 
<!ELEMENT email (#PCDATA)> 
<!ATTLIST person gender CDATA #IMPLIED> 
<!ATTLIST person location CDATA #IMPLIED> 
]> 
<addressbook> 

<!--beginning of document--> 

<person gender="Male" location="South"> 
  <name> 
    <![CDATA[ <<<John Doe>>>&lt; ]]> 
  </name> 
  <email>john@doe.com</email> 
</person> 

<person gender="Female"> 
  <name>Jane Doe</name> 
  <email>jane@doe.com</email> 
</person> 

<person> 
  <name>Mary Doe</name> 
  <email>mary@doe.com</email> 
</person> 
</addressbook> 

go to topHow IBM and Sun parsers deal with common XML document elements
The table below lists the ways the IBM and Sun parser deal with certain XML document elements.
XML document elements Sun Parser IBM Parser
Internal DTD It is not displayed by the Document object tree The name of DTD is displayed right after #document node
External DTD It is not displayed by the Document object tree It is not displayed by the Document object tree
Comment It is not displayed by the Document object tree The content of the comment is shown as a leaf in the Document object tree.
CDATA It is not implemented. This node is displayed as a Text node Properly displayed as a CDATASection node
Attribute The String representation of getAttributes() displays name and its value The String representation of getAttributes() displays value in square brackets
Carriage Return/Line Feed Carriage return between elements is ignored Carriage return between elements is preserved as a Text node (with cr/lf in it)

go to topList of anomalies

The sections below outline the differences between the way the IBM and Sun parsers deal with certain XML document elements.

DTDs

Description

If an internal DTD is used, the IBM parser displays the DTD as a branch on the root of the Document object tree. The Sun parser does not display it at all. Further analysis shows that nodes returned by the IBM parser does not conform to the W3C's recommendation.  For instance, for the following part of DTD (from above):
<!ELEMENT email (#PCDATA)>
<!ATTLIST person gender CDATA #IMPLIED>
the IBM parser returns:
Element declaration:  nodeName(#element-declaration), nodetype(20);

Attribute declaration:  nodeName(#attribute-definition-list), nodetype(21);

These node names (#element-declaration and #attribute-definition-list) and node types (20 and 21) are not in the W3C's XML 1.0 Recommendation. 

When external DTDs are used, both the IBM and Sun parsers do not display the DTD under the Document object tree root.

Analysis/Recommendation

Don't write programs that depend on extracted DTD information using the IBM parser since moving to other parser will break your code.

Attributes

Description
Both the Sun and IBM parsers work the same when it comes to extracting attributes from an Element node. However, the String representation of the NamedNodeMap is slightly different. A NamedNodeMap object is returned by calling getAttributes() on an Element node.

For example, for the following XML document fragment:
    <person gender="Male" 
     location="South">

The Sun returns the following for elementNode.getAttributes().toString():
    gender="Male" location="South"

The IBM parser outputs the following for elementNode.getAttributes().toString():
    [Male South]

If an Element node does not contain any attributes, the Sun parser returns nothing and the IBM parser returns empty square brackets, as the value for the elementNode.getAttributes().toString() method.

Analysis/Recommendation

Since there is not difference in extracting attribute from an Node, your code does not have to be different for the Sun and IBM parsers. However, don't rely on the toString() method on NameNodeMap to return a String in the same format on both parsers.

Comments

Description

The IBM parser displays comments as leaves in the Document object tree, but Sun parser ignores comments.

Analysis/Recommendation

Don't  write your programs to depend of information extracted from comments using the IBM parser, because moving to other parser may make break the program.

CDATA sections

Description

The Sun parser does not implement the CDATASection interface.  The Sun parser treats a CDATA section as a Text node. However, the IBM parser implements the CDATASection interface properly.

Analysis/Recommendation

Although the Sun parser does not implement the CDATASection interface and treats CDATA nodes as Text nodes, there is no difference in getting value from the node compare to IBM parser since CDATASection interface is empty and it extends TextNode interface. In the Sun parser, typecasting a CDATA node as a CDATASection node throws a class type cast exception.

Carriage return/Line feed (cr/lf) handling

Description

The IBM parser displays a cr/lf between elements as a text node on the Document object tree.  However the Sun parser ignores it.

Analysis/Recommendation

Don't write programs that depend on a specific child node index, since the number of child nodes returned by IBM and Sun parsers is different depending on whether there is a cr/lf between elements.

go to topDownloading the parsers
Please go to the tools section to actually download the Parsers themselves.

We hope that this list helps you in your DOM programming tasks, in keeping your projects DOM 1.0 compliant (instead of depending on the specific features of one parser). As we review more parsers, we will update this list as necessary. So stay tuned and keep coming back. Click here to send us any feedback or comments.

go to top
Copyright © The Bean Factory, LLC. 1998-99. All Rights Reserved.
           Last Updated: Mar. 18 1999.