Parsifal is a validating XML 1.0 parser written in ANSI C. Parsifal API is based on SAX2.
Parsifal can be used for parsing XML based messages (such as REST and RSS) and for application specific data processing e.g. config files, data files etc. Parsifal can also be used for document-oriented processing (e.g. XHTML xhtml1-transitional.dtd) and for parsing modular documents because it is conforming XML 1.0 parser and it supports features like internal and external general entities, DTD parameter entities and default attributes etc. Parsifal is ideal for processing large data files and streams since it's SAX based and consumes very little memory not to mention it is fast enough for most purposes 'cos it's written in C.
Using Parsifal in place of large XML processing libraries (e.g. libxml, xerces) or even in the place of small Expat (which doesn't support DTD validation) can be justified for limited memory environments and in applications requiring bundled parser; because of its modular design parsifal can be easily compiled to support DTD validation or to perform only non-validating parsing etc. If you need higher level tools, for example dom/xpath processing, you should look for other libs of course.
NEW: starting from version 1.1 parsifal offers progressive parsing feature
which can be used to implement pull parser on top of parsifal. see samples/pull
You can download Parsifal including source, documentation and samples from here
Supported SAX events | |
---|---|
startDocument/endDocument | |
startElement/endElement | Thorough namespace support in startElement/endElement callbacks. Also supports getting attributes by name or by index using methods similar to SAX attributes handling. |
characters | |
ignorableWhitespace | |
comment | |
startCDATA/endCDATA | |
processingInstruction | |
errorHandler | for errors/warnings |
startDTD/endDTD | |
encodingAliasHandler | for aliasing/overriding encodings |
xmlDecl | reports XML declaration <?xml version="1.0" ... tag |
skippedEntity | |
resolveEntity/externalEntityParsed | for parsing external entities/external DTDs |
startEntity/endEntity | |
elementDecl, attributeDecl, entityDecl... | Used for reporting DTD declarations |
Supported SAX properties/features |
---|
http://xml.org/sax/features/namespaces |
http://xml.org/sax/features/namespace-prefixes |
http://xml.org/sax/features/external-general-entities |
XMLFLAG_VALIDATION_WARNINGS for handling validation errors as warnings,
XMLFLAG_USE_SIMPLEPULL for progressive/pull parsing. see XMLFlags for more info on Parsifal specific properties |
Read the manual page. Examine the samples that come with the download.
Note that 1st priority for samples is to demonstrate xml
parsing in a most easy way/portable manner NOT to show best practices
for using safe C etc.
Sample | Description (see README in each sample dir for more info) | |
---|---|---|
elements.c | Simple example that output elements from stdin into stdout with some indentation. | README |
zenvalid.c | Demonstrates parsing in validating mode. Uses the same document sample as zenstory.c so this is good example about how validation simplifies SAX event handling compared to zenstory.c | README |
datatype.c | Demonstrates one possible approach of handling datatypes in C SAX parsing. | README |
zenstory.c zenstory.h |
Despite of its name demostrates some real world SAX parsing techniques. | README |
pull | Pull parser implementation. Example usage pull1.c | README see also XMLParser_HasMoreEvents feature |
nsvalid.c | Demostrates validation filtering; validating DTDs with namespace prefixes. Also shows some techiques for base uri/directory handling and setExternalSubset function. | README |
xmlplint | Command line tool for xml parsing/validation. Contains useful modules like curlread.c, catalogs.c and uriresolver.c | see xmlplint page |
canonxml.c | Turns input XML file into canonical XML (linefeeds turned into character references, attributes sorted etc.). Is used by xmltest OASIS XML testsuite parser. | README |
winurl.c | Uses windows urlmon.dll for simple parsing of urls - only inputsource handling is windows specific, otherwise os independent. | README |
xmltest.c | OASIS XML testsuite parser | README |
test_pool.c | Demonstrates XMLVector, XMLStringbuf and XMLPool usage. (These are ADTs that are used internally by Parsifal but can be used in your application too - This example has nothing to do with XML parsing) |
I've done some Parsifal benchmarking on my Dell Inspiron 8200 laptop:
In-memory 11 MB test.rdf UTF-8 encoded file, just dummy startElement, endElement and characters handlers set, gets parsed in about 0.66 sec (namespaces on). Validating parse with simple internal DTD subset takes about 0.86 sec! In this case simple DTD means 10 element declarations, 2 of them complex sequences and choices, others simple #PCDATA and 4 attribute declarations. Validation is of course recommended whenever possible since it simplifies SAX event handling code and makes your application more fail-safe.
There are many optimization areas in Parsifal so various optimizations are expected in the future.
NOTE: 11 MB doc is relatively large XML doc and if that's parsed in less than a second in my test system this means that parsing should be fast enough for everybody; for example 654 KB 1998statistics.xml (http://www.ibiblio.org/xml/examples/1998statistics.xml) gets parsed in about 0.038 sec! I've also parsed very large docs (like 256 MB file) with Parsifal without problems. Freshmeat project dump fm-projects.rdf gets parsed in 4.5 secs! (about 83 MB from http://download.freshmeat.net/backend/ - you should get compressed .bz2 file if you're interested in that)
C:\>xmlplint -v
xmlplint 1.0.0
- Parsifal XML Parser 0.9.3
- libcurl/7.14.0
C:\>xmlplint -t 10 -f 3 -M vtest.rdf
Document(s) parsed in 5109 ms
10 iterations. Average: 510 ms
C:\>xmlplint -t 10 -f 3 -M -V vtest.rdf
Document(s) parsed in 5656 ms
10 iterations. Average: 565 ms
-------------------------------------------------------
C:\>xmllint --version
xmllint: using libxml version 20619CVS2426
compiled with: DTDValid FTP HTTP HTML C14N Catalog XPath XPointer XInclude Ic
onv Unicode Regexps Automata Schemas Modules
C:\>xmllint --noout --timing --noent vtest.rdf
Parsing took 890 ms
Freeing took 156 ms
C:\>xmllint --valid --noout --timing --noent vtest.rdf
Parsing took 1000 ms
Freeing took 156 ms
tested on Fujitsu-siemens amilo M1437G M740 winXP
Copyright © 2002-2008 Toni
Uusitalo.
Send mail, suggestions and bug reports to
Last modified: 04.10.2008 00:00