Comprehensive
parser read-performance
testing
Article Author: Nazmul
Idris
Date: March 27 1999.
Copyright Nazmul Idris
1998-2006. All
Rights Reserved.
Index
Before
you
begin
Introduction
Testing
methodology
New test
input data
Test
limitations
Testing
environment
Test results
Graphs
Tables
Analysis
Dual
processor vs Single
processor
Summary
Downloading
the
source code and running the programs
Where are the
HotSpot results?
Before
you begin
While you are reading the
tutorial and
trying out the examples, please use the links in the references
section to understand terms that you are unfamiliar with.
Introduction
After we did the first
performance review of Java XML Parsers on Feb. 12 1999, new
versions
of the IBM and Sun parser have become available. We also decided to
include
an open source parser called OpenXML into the test, just to see how
this
newcomer would perform. You can download all these parsers from the tools
section. The parser versions that we tested are:
- Sun Project X Technology
Release 1
- IBM XML Parser for Java
v2.0.4
- OpenXML v1.0.4.
In order to reproduce our results
you must use the
code that is shown in
the new parsers article to instantiate these parsers. If you do not
use the code shown there, the performance results for the IBM and
OpenXML
parsers will be quite different.
Testing
methodology
We tested these parsers
by generating
different size XML documents and measuring how long it took for each
parser
to read in the document and convert it into a (DOM) document object.
The
generated input XML documents all have an internal DTD. The parser also
was told to validate the XML input files (so they are not just read in
as well-formed documents).
New test
input data
We have also decided to
change the method
of testing (compared to what we did the last time) and follow a more
rigorous
and complete testing procedures. We also change the data set (XML
document)
against which we ran the tests. We previously used an XML document
which
had a DTD, but only contained elements. This time we have a test
document
that contains an internal DTD, elements, attributes, entities, #PCDATA
sections and CDATA sections. Compared to the last test, this
document
took almost twice as long to read because of the richness of the
content.
Test
limitations
The test results and
testing methods
outlined in this article only compare the read performance of these XML
parsers. That is, the testing programs measure the time it takes to
read
an XML document and convert it into a (DOM 1.0) document object. It
does
not say anything about the performance of these parsers for modifying
the
(DOM) document object using the DOM API. These tests show the read-only
performance of the parsers. Also, these tests have nothing to do with
the
SAX performance of these parsers, because we are only testing DOM
performance.
We are in the process of coming
up with more comprehensive
parser testing methods and code to test DOM traversal and modification
performance along with multithreaded test programs (to test dual
processor
advantages). We are continually improving our testing procedures to
give
you report results more accurately.
Testing
environment
We tested a variety of
machine and software
configurations with the same test input data and testing programs. We
used
the following hardware configurations:
- Single processor machine,
PentiumII/233, 256MB PC100
SDRAM, ASUS P2B motherboard, IDE harddrive, WinNT4.0 SP3
- Dual processor machine,
PentiumII/400, 256MB PC100
SDRAM, ASUS P2B-DS motherboard, Ultra Wide SCSI II harddrive, WinNT4.0
SP4.
We used the following software:
- JDK1.2 Classic VM with the
new JIT
update
We collected a large amount of data
by running our
test program on all combinations of hardware and software platforms on
a variety of input file sizes. The input files range from 1Meg to
40Megs
of XML data. Not all the parsers were able to handle the 40Meg XML
input
file.
Test
results
We generated 2 graphs
from the data that
we collected. Each graph shows the time it took each parser to create a
DOM document object from XML files of different sizes.
Graphs
The graph below shows
the performance
on
- a Dual processor
machine
- with the JDK1.2 Classic
VM
(with new JIT update).
The graph below shows
the performance
on
- a Single processor
machine
- with the JDK1.2 Classic
VM
(with the JIT update).
Tables
The data in the graphs
is listed here
in 2 tables.
The following table contains
test data for the
Classic VM on a Dual processor machine.
Input XML
file size |
Sun
(in seconds) |
IBM
(in seconds) |
OpenXML
(in seconds) |
| 100KB |
1.15 |
1.45 |
1.65 |
| 500KB |
2.06 |
1.85 |
4.29 |
| 1MB |
2.75 |
2.45 |
6.59 |
| 2MB |
4.11 |
3.53 |
11.21 |
| 3MB |
5.54 |
4.60 |
15.54 |
| 4MB |
6.93 |
5.82 |
20.21 |
| 5MB |
8.31 |
6.82 |
24.93 |
| 7MB |
11.11 |
9.51 |
34.00 |
| 10MB |
15.21 |
12.78 |
48.12 |
| 15MB |
21.93 |
18.65 |
84.18 |
| 20MB |
29.26 |
25.04 |
not
tested |
| 25MB |
35.93 |
OutOfMemoryException |
not
tested |
| 30MB |
49.67 |
OutOfMemoryException |
not
tested |
| 35MB |
66.35 |
OutOfMemoryException |
not
tested |
| 40MB |
74.12 |
OutOfMemoryException |
not
tested |
The following table contains
test data for the
Classic VM on a Single processor machine.
Input XML
file size |
Sun
(in seconds) |
IBM
(in seconds) |
OpenXML
(in seconds) |
| 100KB |
1.78 |
1.74 |
1.82 |
| 500KB |
5.66 |
2.35 |
5.42 |
| 1MB |
10.03 |
3.11 |
9.11 |
| 2MB |
13.98 |
4.82 |
14.03 |
| 3MB |
16.43 |
6.30 |
18.67 |
| 4MB |
18.94 |
8.06 |
23.11 |
| 5MB |
20.96 |
9.39 |
27.69 |
| 7MB |
25.62 |
12.42 |
36.72 |
| 10MB |
32.30 |
17.55 |
49.59 |
| 15MB |
45.39 |
24.73 |
92.99 |
| 20MB |
54.31 |
32.47 |
not
tested |
| 25MB |
66.87 |
OutOfMemoryException |
not
tested |
| 30MB |
76.33 |
OutOfMemoryException |
not
tested |
| 35MB |
156.02 |
OutOfMemoryException |
not
tested |
| 40MB |
171.74 |
OutOfMemoryException |
not
tested |
Analysis
Inside each table, it
is clear to see
that the IBM parser performs the best. The Sun parser however can deal
with the largest files in similar memory configurations. The OpenXML
parser
comes in at third place for speed and filesize handling.
We wanted to show you what a
difference the choice
of machine and VM makes to the final performance (for each XML parser).
We looked at dual processor performance vs. single processor
performance
(for each parser). We have outlined our results in the tables
below.
Dual
Processor vs. Single Processor
The table below show
the average performance
boost of using a dual processor machine compared to a single processor
one. A positive percentage means that the dual processor was faster by
an average percentage of the shown value. A negative percentage means
that
the dual processor was slower by an average percentage of the shown
value.
|
Classic VM |
| Sun |
+140% |
| IBM |
+32% |
| OpenXML |
+17% |
Summary
Without any shadow of
doubt, the best
performer is the IBM XML parser. However, the Sun parser can deal with
the largest file size for any given memory configuration. The Sun
parser
comes in second for speed. The OpenXML parser comes in at third place
for
speed and filesize handling. All three parsers are very good.
The dual processor (or higher
SMP configurations)
machine is great for those of you who need to serve large amounts of
XML
information while doing other server tasks. The processor usage for the
dual processor machine was 50% when only the test programs were run on
it. On the other hand, the single processor usage was 100% when only
the
testing program was running. The dual processor machine has a much
greater
load handling capacity compared to the single processor machine, an
important
factor for servers where many many processes may be running. So for a
server
configuration it is better to get a multi-processor machine rather than
a fast single processor machine.
Downloading
the source code and running the programs
You need to download the
IBM, Sun and
OpenXML parsers before using DomView. You can download these parsers
from
the tools section. Before
running
the program please refer to How-to:
Setup CLASSPATH article for information on how to setup your
CLASSPATH
on JDK1.1 and 1.2 so that you can use these parsers.
Here is the distribution zip
file (containing
source code) for the testing program: src.zip.
The table below contains a
description of the
files in src.zip. Please note that you MUST have all 3 parsers (Sun,
IBM
and OpenXML) setup on your system before you can run this program.
| Files in zip file |
Description |
| ParserTest.java |
Main program that
performs the test. Run this
class and provide it with command line parameters shown below. |
| classic.bat |
Batch file to run a
set of tests on the Classic
Java Virtual Machine. |
|
To run the test program,
you must type:
"java -mxsize
ParserTest ParserName InputFileSize"
at the command prompt.
- The ParserName
parameter can be one of the following: sun, openxml
or ibm.
- The InputFileSize
parameter is an integer that specifies the size of the input file in
multiples
of 100KB.
- The size
is an integer that is the amount of RAM that you want the JVM to use
(in
bytes).
For example, to run a test using
with JDK1.2 (with
access to a max of 64MB of RAM) and the Sun parser for a 1MB XML input
file, type: "java -mx64m ParserTest
sun
10". After you run the program, there
will be a "temp.xml"
file that is generated by the program (in the current directory) and
used
to perform the test. You should delete this file; we left it in so that
you can see the XML document (input file) that is used to perform the
test.
Make sure that you have Java2
(or JDK1.2) installed
on your machine because it will not work with JDK1.1.
Please go to the tools
section to actually download the Parsers themselves.
Where
are the Hotspot results?
I apologize for
jumping the gun
and releasing the beta Hotspot VM performance data. Due to legal
reasons
I have retracted these results and will not publish them until April 28
(after Hotspot is released at JESS). We were able to come to some
conclusions
about the performance of these parsers using the Hotspot VM data which
we can't publish right now.
We hope you enjoyed this
article, we will have
more parser performance reviews, so stay tuned and keep coming back :).
Click here
to send me any feedback/comments.
 |
Copyright
© Nazmul Idris 1998-2006. All Rights Reserved.
Last
Updated: Mar. 18 1999.
|