Parsifal XML Parser FAQ


1. I get error in windef.h when trying to compile my application in windows platform:

   \windef.h(143) : warning C4114: same type qualifier used more than once
   \windef.h(143) : error C2632: 'char' followed by 'char' is illegal
   \windef.h(143) : warning C4091: 'typedef ' : ignored on left of 'unsigned
                    char ' when no variable is declared
   

One solution is to include windows.h always before parsifal headers and not to use precompiled headers option. Example:

     #ifdef WIN32
     #include <windows.h>
     #endif
     #include "libparsifal/parsifal.h"
     #include "libparsifal/dtdvalid.h"
     ... etc ...
   
so windows.h gets always included before parsifal BYTE define (which is the reason for the error)

2. I get encoding errors when trying to parse my document

Remember to save your documents in wanted encoding and if you don't save in default UTF-8 encoding YOU MUST SPECIFY encoding attribute in xml declaration e.g. <?xml version="1.0" encoding="ISO-8859-1"?>. Choosing encoding in the saving stage is XML/text editor specific, windows notepad for example lets you choose encoding in save dialog.

3. I get errors when compiling samples or/and compiling Parsifal library as part of my project.

You've probably compiled using C++ compiler and not C compiler. Samples aren't C++ compliant 'cos some simple mandatory C++ casts are missing here and there. Otherwise Parsifal should be useful for any language that can interface with C libraries.

4. Parsifal library is too large/complex for my needs, I only need simple XML parser without DTD support and all these bloated features!

Parsifal can be compiled without DTD_SUPPORT which results in VERY SMALL and FAST library still capable of handling for example xml namespaces and unicode so this is recommended when you need something smaller/simpler. In this way you have scalable solution if/when you need advanced XML processing capabilities. Also note that even when using stripped down version of Parsifal (w/o DTD_SUPPORT and DTDVALID_SUPPORT) you're still using parser that has passed majority of XML conformance test suite tests - most "simple parsers" cannot run the test suite because of their inadequate DTD/entities support.

5. SAX parsing is hard, why doesn't Parsifal provide DOM or data bindings features?

While DOM suits some tasks very well it isn't optimal model for many parsing needs - Abstract tree with only string datatype is seldom good representation of data - and also not very efficient form to work with memory consumption wise. Document oriented processing in contrast to data oriented processing is another matter of course. Think for example the following simple data:

   <person>
     <firstname>Toni</firstname>
     <surname>Uusitalo</surname>
   </person>
   
Trust me the best thing you can do is to get your own "person object" with properties "firstname" and "surname" out of this and not some abstract tree. Of course there are ways to query the abstract trees; xpath, but implementing all these (DOM+XPATH+...) in Parsifal isn't practical - libraries like libxml exist for this if you need these features. I mentioned "person object" so we come to the second option: data binding. While useful in some cases XML data binding can not represent some XML constructs like mixed content very well. SAX has a learning curve but in my opinion it's worth the time spent - just study the samples and examime xmlplint output with -f 1 flag for example. Also: learning to write DTDs and using validation simplifies SAX processing considerably - writing some sort of schemas is usually needed for XML data binding tools etc. so SAX isn't that much harder in that respect.

6. Why doesn't Parsifal provide XML writing/serializing capabilities?

Author believes printf family of functions are sufficient for outputting XML. All you have to do is to make sure you escape predefined entity characters in element content and in attribute values. (like '&' and '<').

7. Can I parse HTML/tag soup with parsifal, I need atleast some more relaxed parsing mode, I don't want to stop parsing when some "minor" well-formedness error occurs!

XML specification defines strict rules for well-formedness, this is actually a good thing and leads to more efficient/safe parsing. If you must parse HTML/tag soup there are very good specialized libraries available for that like for example tidylib. It's usually best to use specialized libraries - tidy for tag soup and parsifal for xml parsing. Tip: you can feed tidied XHTML to parsifal if you want to process your documents with SAX - you can even invoke tidy only if some well-formedness error occurs

8. Parsing takes very long time/seems to hang when processing documents with massive content/base64 encoded binary data, why?

By default parsifal tries to keep characterHandler content in single chunk. Use XMLFLAG_SPLIT_LARGE_CONTENT flag when parsing this sort of documents.


Copyright © 2002-2008 Toni Uusitalo.
Send mail, suggestions and bug reports to

Last modified: 04.10.2008 00:00