Added XMLFLAG_USE_SIMPLEPULL for progressive parsing and xmlreader.c that implements pull parser. Included contributions made by other people: libparsifal-config, pns.h for global namespace handling. Thanks to all who contributed! Lots of improvements for portability; xmlhash improvements also improve performance. Added better uri handling for xmlplint - see uriresolver.h for details. Lot of minor bug fixes. See also changelog.
Enjoy!
Docbook document (mapped to docbook 4.5 via catalogs)
pyGtk-GettingStarted.xml gets
parsed and validated in 140 ms with pentium M740 CPU. (I think w/o catalogs ms number would be even lower). That's pretty fast considering Parsifal validation is now also very conformant to xml specification -
latest addition being validation of ID/IDREF(S) and other TokenizedType attributes. Here is SAX event output of
it. Some statistics (docbook 4.5): elementDecls 406, attributeDecls 7567, entityDecls 3238
C:\xmlplint>xmlplint pyGtk-GettingStarted.xml -W -c -f1 -M -t1 -o docbook.txt
Validation warning: Value for attribute 'format' in element 'imagedata' doesn't
match its declared value. Try: BMP, CGM-CHAR, CGM-BINARY, CGM-CLEAR
At line 66 col 71 of resource pyGtk-GettingStarted.xml
<imagedata fileref="figures/base.png" format="png" align="center"/>
----------------------------------------------------------------------^
Validation warning: Value for attribute 'format' in element 'imagedata' doesn't
match its declared value. Try: BMP, CGM-CHAR, CGM-BINARY, CGM-CLEAR
At line 232 col 79 of resource pyGtk-GettingStarted.xml
<imagedata fileref="figures/helloworld.png" format="png" align="center"/>
------------------------------------------------------------------------------^
Validation warning: ID attribute with value 'sec-SignalsOnDestinationWidget' not
found. Cannot resolve reference (IDREF)
At line 696 col -1 of resource pyGtk-GettingStarted.xml
Validation warning: ID attribute with value 'ch-AdvancedEventAndSignalHandling'
not found. Cannot resolve reference (IDREF)
At line 696 col -1 of resource pyGtk-GettingStarted.xml
Validation warning: ID attribute with value 'ch-PackingWidgets' not found. Canno
t resolve reference (IDREF)
At line 696 col -1 of resource pyGtk-GettingStarted.xml
Validation warning: ID attribute with value 'ch-SettingWidgetAttributes' not fou
nd. Cannot resolve reference (IDREF)
At line 696 col -1 of resource pyGtk-GettingStarted.xml
Validation warning: ID attribute with value 'sec-SignalsOnSourceWidget' not foun
d. Cannot resolve reference (IDREF)
At line 696 col -1 of resource pyGtk-GettingStarted.xml
Document(s) parsed in 140 ms
A sort of milestone. Still needs some tweaking and hard testing. Next stop is RelaxNG module?!
New release introduces several optimizations (see ChangeLog). Portability has also been improved by making some changes mostly in xmlcfg.h. Infact PalmOS port should be on its way - I've been in touch with a guy who's working on this. More about this soon.
that although parsifal doesn't yet pass the oasis xml testsuite with 100 % conformance, the most of remaining/failing about 50 of 1811 tests aren't critical for using XML 1.0 features - you can use them all! Also note that there are a few tests that fail because of namespace support or because the test parser used libiconv and thus reports PASS with japanese encodings etc.
New release 0.8.0 adds support for DTD processing; declaration handlers, default attribute processing and parameter entities etc. Parser stays quite lightweight despite of all these new features!
I ran Parsifal 0.7.4 on MSVC++ profiler with test.rdf (11 MB attribute oriented document) some time ago. Function timing results are here.
Ignore modules profiler.obj
and bench.obj
when checking the results.
Inlining ReadCh
(with __inline
) improves the performance most but as there's total 24 occurrences
of ReadCh calls, this makes executable about 4 KB bigger! Making separate version of inlined ReadCh for critical points could be
an option in this case. Besides inlining ReadCh making new version of XMLStringbuf_Append inside parsifal.c
and _forceinline'ing it improves
total performance from about 0.88 sec to 0.72 sec. (DLL executable size grows from 80KB to 88KB). I didn't yet make any calling convention tests or changes - don't know if they'll
be worth all the #ifdefs and all that hassle.
Some optimizations via conditional compilation will be available in the next release.
Copyright © 2002-2008 Toni
Uusitalo.
Send mail, suggestions and bug reports to
Last modified: 04.10.2008 00:00