DOM 1.0 implementation anomalies
Author: Nazmul
Idris
Date: March 18 1999.
Copyright The Bean Factory, LLC. 1998-1999. All
Rights Reserved.
Index
Introduction
XML document used for
testing
How IBM and Sun parsers deal with
common XML document elements
List of anomalies
Downloading the parsers
Introduction
When we wrote DomView 1.0 using the 3
Java XML Parsers that we tested, we noticed that each parser implemented
DOM in a different way for certain features. In fact, some parsers didn't
even implement all the interfaces in the DOM 1.0 API. We compiled a list
of these differences between the Sun
Project X Technology Release 1 and
IBM
XML Parser for Java v2.0.4 parsers
and these are listed below. The OpenXML parser has not been analyzed in
this way due to its (slow) performance issues.
XML
document used for testing
We found these anomalies by using a sample
XML document that used an internal DTD and external DTD. I am listing the
version of the input file with the internal DTD below:
<?xml version='1.0'?>
<!DOCTYPE addressbook [
<!ELEMENT addressbook (person)*>
<!ELEMENT person (name, email)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT email (#PCDATA)>
<!ATTLIST person gender CDATA #IMPLIED>
<!ATTLIST person location CDATA #IMPLIED>
]>
<addressbook>
<!--beginning of document-->
<person gender="Male" location="South">
<name>
<![CDATA[ <<<John
Doe>>>< ]]>
</name>
<email>john@doe.com</email>
</person>
<person gender="Female">
<name>Jane Doe</name>
<email>jane@doe.com</email>
</person>
<person>
<name>Mary Doe</name>
<email>mary@doe.com</email>
</person>
</addressbook>
How
IBM and Sun parsers deal with common XML document elements
The table below lists the ways the IBM
and Sun parser deal with certain XML document elements.
| XML document elements |
Sun Parser |
IBM Parser |
| Internal DTD |
It is not displayed by the Document object tree |
The name of DTD is displayed right after #document node |
| External DTD |
It is not displayed by the Document object tree |
It is not displayed by the Document object tree |
| Comment |
It is not displayed by the Document object tree |
The content of the comment is shown as a leaf in the Document object
tree. |
| CDATA |
It is not implemented. This node is displayed as a Text node |
Properly displayed as a CDATASection node |
| Attribute |
The String representation of getAttributes() displays name and its
value |
The String representation of getAttributes() displays value in square
brackets |
| Carriage Return/Line Feed |
Carriage return between elements is ignored |
Carriage return between elements is preserved as a Text node (with
cr/lf in it) |
List
of anomalies
The sections below outline the differences between the way
the IBM and Sun parsers deal with certain XML document elements.
DTDs
Description
If an internal DTD is used, the IBM parser displays the DTD as a branch
on the root of the Document object tree. The Sun parser does not display
it at all. Further analysis shows that nodes returned by the IBM parser
does not conform to the W3C's recommendation. For instance, for the
following part of DTD (from above):
<!ELEMENT email (#PCDATA)>
<!ATTLIST person gender CDATA #IMPLIED>
the IBM parser returns:
Element declaration: nodeName(#element-declaration),
nodetype(20);
Attribute declaration: nodeName(#attribute-definition-list),
nodetype(21);
These node names (#element-declaration
and #attribute-definition-list)
and node types (20 and 21)
are not in the W3C's XML 1.0 Recommendation.
When external DTDs are used, both the IBM and Sun parsers do not display
the DTD under the Document object tree root.
Analysis/Recommendation
Don't write programs that depend on extracted DTD information
using the IBM parser since moving to other parser will break your code.
Attributes
Description
Both the Sun and IBM parsers work the same when it comes to extracting
attributes from an Element node. However, the String representation of
the NamedNodeMap is slightly different. A NamedNodeMap object is returned
by calling getAttributes() on an Element node.
For example, for the following XML document fragment:
<person gender="Male"
location="South">
The Sun returns the following for elementNode.getAttributes().toString():
gender="Male" location="South"
The IBM parser outputs the following for elementNode.getAttributes().toString():
[Male South]
If an Element node does not contain any attributes, the Sun parser returns
nothing and the IBM parser returns empty square brackets, as the value
for the elementNode.getAttributes().toString() method.
Analysis/Recommendation
Since there is not difference in extracting attribute from an Node,
your code does not have to be different for the Sun and IBM parsers. However,
don't rely on the toString() method on NameNodeMap to return a String in
the same format on both parsers.
Comments
Description
The IBM parser displays comments as leaves in the Document object tree,
but Sun parser ignores comments.
Analysis/Recommendation
Don't write your programs to depend of information extracted from
comments using the IBM parser, because moving to other parser may make
break the program.
CDATA sections
Description
The Sun parser does not implement the CDATASection interface.
The Sun parser treats a CDATA section as a Text node. However, the IBM
parser implements the CDATASection interface properly.
Analysis/Recommendation
Although the Sun parser does not implement the CDATASection
interface and treats CDATA nodes as Text nodes, there is no difference
in getting value from the node compare to IBM parser since CDATASection
interface is empty and it extends TextNode interface. In the Sun
parser, typecasting a CDATA node as a CDATASection node throws
a class type cast exception.
Carriage return/Line feed (cr/lf) handling
Description
The IBM parser displays a cr/lf between elements as a text node on the
Document object tree. However the Sun parser ignores it.
Analysis/Recommendation
Don't write programs that depend on a specific child node index, since
the number of child nodes returned by IBM and Sun parsers is different
depending on whether there is a cr/lf between elements.
Downloading
the parsers
Please go to the tools
section to actually download the Parsers themselves.
We hope that this list helps you in your DOM programming
tasks, in keeping your projects DOM 1.0 compliant (instead of depending
on the specific features of one parser). As we review more parsers, we
will update this list as necessary. So stay tuned and keep coming back.
Click here
to send us any feedback or comments.
 |
|