Parsifal
XML Parser C library

Abstract


Parsifal is a validating XML 1.0 parser written in ANSI C. Parsifal API is based on SAX2.

Parsifal can be used for parsing XML based messages (such as REST and RSS) and for application specific data processing e.g. config files, data files etc. Parsifal can also be used for document-oriented processing (e.g. XHTML xhtml1-transitional.dtd) and for parsing modular documents because it is conforming XML 1.0 parser and it supports features like internal and external general entities, DTD parameter entities and default attributes etc. Parsifal is ideal for processing large data files and streams since it's SAX based and consumes very little memory not to mention it is fast enough for most purposes 'cos it's written in C.

Using Parsifal in place of large XML processing libraries (e.g. libxml, xerces) or even in the place of small Expat (which doesn't support DTD validation) can be justified for limited memory environments and in applications requiring bundled parser; because of its modular design parsifal can be easily compiled to support DTD validation or to perform only non-validating parsing etc. If you need higher level tools, for example dom/xpath processing, you should look for other libs of course.

NEW: starting from version 1.1 parsifal offers progressive parsing feature which can be used to implement pull parser on top of parsifal. see samples/pull

You can download Parsifal including source, documentation and samples from here



Features


Supported SAX events  
startDocument/endDocument  
startElement/endElement Thorough namespace support in startElement/endElement callbacks. Also supports getting attributes by name or by index using methods similar to SAX attributes handling.
characters  
ignorableWhitespace  
comment  
startCDATA/endCDATA  
processingInstruction  
errorHandler for errors/warnings
startDTD/endDTD  
encodingAliasHandler for aliasing/overriding encodings
xmlDecl reports XML declaration <?xml version="1.0" ... tag
skippedEntity  
resolveEntity/externalEntityParsed for parsing external entities/external DTDs
startEntity/endEntity  
elementDecl, attributeDecl, entityDecl... Used for reporting DTD declarations


XML 1.0 features that are not currently supported by Parsifal:



Supported SAX properties/features
http://xml.org/sax/features/namespaces
http://xml.org/sax/features/namespace-prefixes
http://xml.org/sax/features/external-general-entities
XMLFLAG_VALIDATION_WARNINGS for handling validation errors as warnings, XMLFLAG_USE_SIMPLEPULL
for progressive/pull parsing. see XMLFlags for more info on Parsifal specific properties


Supported XML encodings



When compiled with GNU libiconv support:


see also Notes about encodings


Licence


Parsifal is released to the public domain and is provided "AS IS," without a warranty of any kind. Use at your own risk. See COPYING. Note that even though Parsifal is Public Domain software, GNU libiconv uses LGPL licence and that will affect your software too if you use libiconv.

Conformance


Parsifal is highly conformant xml 1.0 parser. See OASIS XML testsuite results.

How to use


Read the manual page. Examine the samples that come with the download.

Note that 1st priority for samples is to demonstrate xml parsing in a most easy way/portable manner NOT to show best practices for using safe C etc.

Sample Description (see README in each sample dir for more info)
elements.c Simple example that output elements from stdin into stdout with some indentation. README
zenvalid.c Demonstrates parsing in validating mode. Uses the same document sample as zenstory.c so this is good example about how validation simplifies SAX event handling compared to zenstory.c README
datatype.c Demonstrates one possible approach of handling datatypes in C SAX parsing. README
zenstory.c
zenstory.h
Despite of its name demostrates some real world SAX parsing techniques. README
pull Pull parser implementation. Example usage pull1.c README see also XMLParser_HasMoreEvents feature
nsvalid.c Demostrates validation filtering; validating DTDs with namespace prefixes. Also shows some techiques for base uri/directory handling and setExternalSubset function. README
xmlplint Command line tool for xml parsing/validation. Contains useful modules like curlread.c, catalogs.c and uriresolver.c see xmlplint page
canonxml.c Turns input XML file into canonical XML (linefeeds turned into character references, attributes sorted etc.). Is used by xmltest OASIS XML testsuite parser. README
winurl.c Uses windows urlmon.dll for simple parsing of urls - only inputsource handling is windows specific, otherwise os independent. README
xmltest.c OASIS XML testsuite parser README
test_pool.c Demonstrates XMLVector, XMLStringbuf and XMLPool usage.
(These are ADTs that are used internally by Parsifal but can be used in your application too - This example has nothing to do with XML parsing)
 


Performance


I've done some Parsifal benchmarking on my Dell Inspiron 8200 laptop:


In-memory 11 MB test.rdf UTF-8 encoded file, just dummy startElement, endElement and characters handlers set, gets parsed in about 0.66 sec (namespaces on). Validating parse with simple internal DTD subset takes about 0.86 sec! In this case simple DTD means 10 element declarations, 2 of them complex sequences and choices, others simple #PCDATA and 4 attribute declarations. Validation is of course recommended whenever possible since it simplifies SAX event handling code and makes your application more fail-safe.

There are many optimization areas in Parsifal so various optimizations are expected in the future.

NOTE: 11 MB doc is relatively large XML doc and if that's parsed in less than a second in my test system this means that parsing should be fast enough for everybody; for example 654 KB 1998statistics.xml (http://www.ibiblio.org/xml/examples/1998statistics.xml) gets parsed in about 0.038 sec! I've also parsed very large docs (like 256 MB file) with Parsifal without problems. Freshmeat project dump fm-projects.rdf gets parsed in 4.5 secs! (about 83 MB from http://download.freshmeat.net/backend/ - you should get compressed .bz2 file if you're interested in that)


Planning to speed up xml processing by skipping validation? Maybe not worth it:

  C:\>xmlplint -v
  xmlplint 1.0.0
  - Parsifal XML Parser 0.9.3
  - libcurl/7.14.0  
    
  C:\>xmlplint -t 10 -f 3 -M vtest.rdf
    Document(s) parsed in 5109 ms
    10 iterations. Average: 510 ms

  C:\>xmlplint -t 10 -f 3 -M -V vtest.rdf
    Document(s) parsed in 5656 ms
    10 iterations. Average: 565 ms
  
  -------------------------------------------------------
  
  C:\>xmllint --version
  xmllint: using libxml version 20619CVS2426
  compiled with: DTDValid FTP HTTP HTML C14N Catalog XPath XPointer XInclude Ic
  onv Unicode Regexps Automata Schemas Modules
  
  C:\>xmllint --noout --timing --noent vtest.rdf
  Parsing took 890 ms
  Freeing took 156 ms
    
  C:\>xmllint --valid --noout --timing --noent vtest.rdf
  Parsing took 1000 ms
  Freeing took 156 ms
  

tested on Fujitsu-siemens amilo M1437G M740 winXP


ChangeLog


ChangeLog is here. You might want to read API changes too.


Copyright © 2002-2008 Toni Uusitalo.
Send mail, suggestions and bug reports to

Last modified: 04.10.2008 00:00