|
DOM 1.0
implementation anomalies
Author: Nazmul
Idris
Date: March 27 1999.
Copyright Nazmul Idris.
1998-1999. All
Rights Reserved.
Index
Introduction
XML document
used for
testing
How IBM and Sun
parsers deal with
common XML document elements
List of anomalies
Downloading the
parsers
Introduction
When we wrote DomView 1.0
using the 3
Java XML Parsers that we tested, we noticed that each parser
implemented
DOM in a different way for certain features. In fact, some parsers
didn't
even implement all the interfaces in the DOM 1.0 API. We compiled a
list
of these differences between the Sun
Project X Technology Release 1 and
IBM XML Parser for
Java v2.0.4 (not
in backward compatibility mode) parsers
and these are listed below. Information about the OpenXML parser will
be
included here shortly.
These results are yielded using
the new IBM Parser
when not in backward-compatible mode (by using the new way of
instantiating DOM
document objects).
In order to read the older
article comparing the Sun Project X
Technology Release 1
and IBM XML Parser for Java v2.0.4
(in
backward compatibility mode with version 1.x) parsers
click here.
XML
document used for testing
We found these anomalies
by using a sample
XML document that used an internal DTD and external DTD. I am listing
the
version of the input file with the internal DTD below:
<?xml
version='1.0'?>
<!DOCTYPE addressbook [
<!ELEMENT addressbook
(person)*>
<!ELEMENT person (name,
email)>
<!ELEMENT name
(#PCDATA)>
<!ELEMENT email
(#PCDATA)>
<!ATTLIST person gender
CDATA #IMPLIED>
<!ATTLIST person location
CDATA #IMPLIED>
]>
<addressbook>
<!--beginning of
document-->
<person gender="Male"
location="South">
<name>
<![CDATA[
<<<John
Doe>>>< ]]>
</name>
<email>john@doe.com</email>
</person>
<person
gender="Female">
<name>Jane
Doe</name>
<email>jane@doe.com</email>
</person>
<person>
<name>Mary
Doe</name>
<email>mary@doe.com</email>
</person>
</addressbook>
How
IBM and Sun parsers deal with common XML document elements
The table below lists the
ways the IBM
and Sun parser deal with certain XML document elements.
| XML document elements |
Sun Parser |
IBM Parser (not in
backward compatibility
mode) |
| Internal DTD |
It is not displayed by the Document object tree |
It is not displayed by the Document object tree |
| External DTD |
It is not displayed by the Document object tree |
It is not displayed by the Document object tree |
| Comment |
It is not displayed by the Document object tree |
The content of the comment is shown as a leaf in
the Document object
tree. |
| CDATA |
It is not implemented. This node is displayed as
a Text node |
Properly displayed as a CDATASection node |
| Attribute |
The String representation of getAttributes()
displays name and its
value |
The String representation of getAttributes() is
not implemented (NamedNodeMap
implementation class does not have a toString() method) |
| Carriage Return/Line Feed |
Carriage return between elements is ignored |
Carriage return between elements is preserved as
a Text node (with
cr/lf in it) |
List
of anomalies
The sections below outline the differences
between the way
the IBM and Sun parsers deal with certain XML document elements.
Attributes
Description
Both the Sun and IBM parsers work the same when it comes to extracting
attributes from an Element node. However, the String representation of
the NamedNodeMap is slightly different. A NamedNodeMap object is
returned
by calling getAttributes() on an Element node.
For example, for the following XML document
fragment:
<person
gender="Male"
location="South">
The Sun implementaion returns the following for elementNode.getAttributes().toString():
gender="Male" location="South"
The IBM implementation does not override the
toString() method and calls
the toString() method defined in the Object class which does not return
anything meaningful.
Analysis/Recommendation
Since there is not difference in extracting
attribute from an Node,
your code does not have to be different for the Sun and IBM parsers.
However,
don't rely on the toString() method on NameNodeMap to return a String
in
the same format on both parsers.
Comments
Description
The IBM parser displays comments as leaves in the
Document object tree,
but Sun parser ignores comments.
Analysis/Recommendation
Don't write your programs to depend of
information extracted from
comments using the IBM parser, because moving to other parser may make
break the program.
CDATA sections
Description
The Sun parser does not implement the CDATASection
interface.
The Sun parser treats a CDATA section as a Text node. However, the IBM
parser implements the CDATASection interface properly.
Analysis/Recommendation
Although the Sun parser does not implement the CDATASection
interface and treats CDATA nodes as Text nodes, there is no difference
in getting value from the node compare to IBM parser since CDATASection
interface is empty and it extends TextNode interface. In the
Sun
parser, typecasting a CDATA node as a CDATASection node
throws
a class type cast exception.
Carriage return/Line feed (cr/lf)
handling
Description
The IBM parser displays a cr/lf between elements as
a text node on the
Document object tree. However the Sun parser ignores it.
Analysis/Recommendation
Don't write programs that depend on a specific child
node index, since
the number of child nodes returned by IBM and Sun parsers is different
depending on whether there is a cr/lf between elements.
Downloading
the parsers
Please go to the tools
section to actually download the Parsers themselves.
We hope that this list helps you
in your DOM programming
tasks, in keeping your projects DOM 1.0 compliant (instead of depending
on the specific features of one parser). As we review more parsers, we
will update this list as necessary. So stay tuned and keep coming back.
Click here
to me any feedback or comments.
 |
Copyright
© Nazmul Idris 1998-2006. All Rights Reserved.
Last Updated: Mar. 18 1999.
|