back to developerlife.com
Sun, IBM and OpenXML
Comprehensive parser read-performance testing
Comprehensive parser read-performance testing
Article Author: Nazmul Idris
Date: March 27 1999.
Copyright Nazmul Idris 1998-2006. All Rights Reserved. 
Index
Before you begin
Introduction
Testing methodology
New test input data
Test limitations
Testing environment
Test results
Graphs
Tables
Analysis
Dual processor vs Single processor
Summary
Downloading the source code and running the programs
Where are the HotSpot results?


go to topBefore you begin
While you are reading the tutorial and trying out the examples, please use the links in the references section to understand terms that you are unfamiliar with. 
go to topIntroduction
After we did the first performance review of Java XML Parsers on Feb. 12 1999, new versions of the IBM and Sun parser have become available. We also decided to include an open source parser called OpenXML into the test, just to see how this newcomer would perform. You can download all these parsers from the tools section. The parser versions that we tested are: 
  • Sun Project X Technology Release 1
  • IBM XML Parser for Java v2.0.4
  • OpenXML v1.0.4.
In order to reproduce our results you must use the code that is shown in the new parsers article to instantiate these parsers. If you do not use the code shown there, the performance results for the IBM and OpenXML parsers will be quite different.
go to topTesting methodology 
We tested these parsers by generating different size XML documents and measuring how long it took for each parser to read in the document and convert it into a (DOM) document object. The generated input XML documents all have an internal DTD. The parser also was told to validate the XML input files (so they are not just read in as well-formed documents).

New test input data

We have also decided to change the method of testing (compared to what we did the last time) and follow a more rigorous and complete testing procedures. We also change the data set (XML document) against which we ran the tests. We previously used an XML document which had a DTD, but only contained elements. This time we have a test document that contains an internal DTD, elements, attributes, entities, #PCDATA sections and CDATA sections.  Compared to the last test, this document took almost twice as long to read because of the richness of the content.
go to topTest limitations 
The test results and testing methods outlined in this article only compare the read performance of these XML parsers. That is, the testing programs measure the time it takes to read an XML document and convert it into a (DOM 1.0) document object. It does not say anything about the performance of these parsers for modifying the (DOM) document object using the DOM API. These tests show the read-only performance of the parsers. Also, these tests have nothing to do with the SAX performance of these parsers, because we are only testing DOM performance.

We are in the process of coming up with more comprehensive parser testing methods and code to test DOM traversal and modification performance along with multithreaded test programs (to test dual processor advantages). We are continually improving our testing procedures to give you report results more accurately.

go to topTesting environment 
We tested a variety of machine and software configurations with the same test input data and testing programs. We used the following hardware configurations:
  1. Single processor machine, PentiumII/233, 256MB PC100 SDRAM, ASUS P2B motherboard, IDE harddrive, WinNT4.0 SP3
  2. Dual processor machine, PentiumII/400, 256MB PC100 SDRAM, ASUS P2B-DS motherboard, Ultra Wide SCSI II harddrive, WinNT4.0 SP4.
We used the following software:
  1. JDK1.2 Classic VM with the new JIT update
We collected a large amount of data by running our test program on all combinations of hardware and software platforms on a variety of input file sizes. The input files range from 1Meg to 40Megs of XML data. Not all the parsers were able to handle the 40Meg XML input file.
go to topTest results
We generated 2 graphs from the data that we collected. Each graph shows the time it took each parser to create a DOM document object from XML files of different sizes. 

Graphs

The graph below shows the performance on 
  • a Dual processor machine 
  • with the JDK1.2 Classic VM (with new JIT update).
Dual Classic
The graph below shows the performance on 
  • a Single processor machine 
  • with the JDK1.2 Classic VM (with the JIT update).
Single Classic
Tables
The data in the graphs is listed here in 2 tables.

The following table contains test data for the Classic VM on a Dual processor machine.
 

Input XML 
file size
Sun 
(in seconds)
IBM 
(in seconds)
OpenXML 
(in seconds)
100KB 1.15 1.45 1.65
500KB 2.06 1.85 4.29
1MB 2.75 2.45 6.59
2MB 4.11 3.53 11.21
3MB 5.54 4.60 15.54
4MB 6.93 5.82 20.21
5MB 8.31 6.82 24.93
7MB 11.11 9.51 34.00
10MB 15.21 12.78 48.12
15MB 21.93 18.65 84.18
20MB 29.26 25.04 not tested
25MB 35.93 OutOfMemoryException not tested
30MB 49.67 OutOfMemoryException not tested
35MB 66.35 OutOfMemoryException not tested
40MB 74.12 OutOfMemoryException not tested

The following table contains test data for the Classic VM on a Single processor machine.
 

Input XML 
file size
Sun 
(in seconds)
IBM 
(in seconds)
OpenXML 
(in seconds)
100KB 1.78 1.74 1.82
500KB 5.66 2.35 5.42
1MB 10.03 3.11 9.11
2MB 13.98 4.82 14.03
3MB 16.43 6.30 18.67
4MB 18.94 8.06 23.11
5MB 20.96 9.39 27.69
7MB 25.62 12.42 36.72
10MB 32.30 17.55 49.59
15MB 45.39 24.73 92.99
20MB 54.31 32.47 not tested
25MB 66.87 OutOfMemoryException not tested
30MB 76.33 OutOfMemoryException not tested
35MB 156.02 OutOfMemoryException not tested
40MB 171.74 OutOfMemoryException not tested
Analysis
Inside each table, it is clear to see that the IBM parser performs the best. The Sun parser however can deal with the largest files in similar memory configurations. The OpenXML parser comes in at third place for speed and filesize handling. 

We wanted to show you what a difference the choice of machine and VM makes to the final performance (for each XML parser). We looked at dual processor performance vs. single processor performance (for each parser).  We have outlined our results in the tables below.

Dual Processor vs. Single Processor

The table below show the average performance boost of using a dual processor machine compared to a single processor one. A positive percentage means that the dual processor was faster by an average percentage of the shown value. A negative percentage means that the dual processor was slower by an average percentage of the shown value.
 

Classic VM
Sun +140%
IBM +32%
OpenXML +17%
go to topSummary 
Without any shadow of doubt, the best performer is the IBM XML parser. However, the Sun parser can deal with the largest file size for any given memory configuration. The Sun parser comes in second for speed. The OpenXML parser comes in at third place for speed and filesize handling. All three parsers are very good.

The dual processor (or higher SMP configurations) machine is great for those of you who need to serve large amounts of XML information while doing other server tasks. The processor usage for the dual processor machine was 50% when only the test programs were run on it. On the other hand, the single processor usage was 100% when only the testing program was running. The dual processor machine has a much greater load handling capacity compared to the single processor machine, an important factor for servers where many many processes may be running. So for a server configuration it is better to get a multi-processor machine rather than a fast single processor machine.

go to topDownloading the source code and running the programs
You need to download the IBM, Sun and OpenXML parsers before using DomView. You can download these parsers from the tools section. Before running the program please refer to How-to: Setup CLASSPATH article for information on how to setup your CLASSPATH on JDK1.1 and 1.2 so that you can use these parsers.

Here is the distribution zip file (containing source code) for the testing program: src.zip.

The table below contains a description of the files in src.zip. Please note that you MUST have all 3 parsers (Sun, IBM and OpenXML) setup on your system before you can run this program.

Files in zip file Description
ParserTest.java Main program that performs the test. Run this class and provide it with command line parameters shown below.
classic.bat Batch file to run a set of tests on the Classic Java Virtual Machine.
To run the test program, you must type: "java -mxsize ParserTest ParserName InputFileSize" at the command prompt. 
  • The ParserName parameter can be one of the following: sun, openxml or ibm
  • The InputFileSize parameter is an integer that specifies the size of the input file in multiples of 100KB. 
  • The size is an integer that is the amount of RAM that you want the JVM to use (in bytes).
For example, to run a test using with JDK1.2 (with access to a max of 64MB of RAM) and the Sun parser for a 1MB XML input file, type: "java -mx64m ParserTest sun 10". After you run the program, there will be a "temp.xml" file that is generated by the program (in the current directory) and used to perform the test. You should delete this file; we left it in so that you can see the XML document (input file) that is used to perform the test.

Make sure that you have Java2 (or JDK1.2) installed on your machine because it will not work with JDK1.1.

Please go to the tools section to actually download the Parsers themselves.

go to topWhere are the Hotspot results?
I apologize for jumping the gun and releasing the beta Hotspot VM performance data. Due to legal reasons I have retracted these results and will not publish them until April 28 (after Hotspot is released at JESS). We were able to come to some conclusions about the performance of these parsers using the Hotspot VM data which we can't publish right now.

We hope you enjoyed this article, we will have more parser performance reviews, so stay tuned and keep coming back :). Click here to send me any feedback/comments.

go to top
Copyright © Nazmul Idris 1998-2006. All Rights Reserved.
     Last Updated: Mar. 18 1999.