iText Tutorial SourceForge.net Logo iText, a Free Java-PDF library
by
Bruno Lowagie
[Home] [Previous] [TOC] [Next] [PDF]

Part II: Other document formats

Chapter 7: XML and (X)HTML


The XML to PDF part in iText is for the moment 'discontinued', but if you are looking for an XML Front End for iText, you may want to take a look at UJAC (Useful Java Application Components).

My first iText XML
In chapter 1 of this tutorial, we allready talked about generating documents in different formats. Alltough iText is mainly a library to generate PDF documents, we have occasionaly generated HTML too. In this chapter, we are going to take a look at XML, the eXtensible Markup Language.
Before we go to the first example, we must point your attention to the fact that the XML generation (and parsing) functionality isn't contained in the default iText-release. The code is in separate tar.gz, zip and jar-files. You also need some external libraries that aren't in the JDK. You need an XML parser library (you can choose any parser you want) and the SAX library (with the org.xml.sax.*-packages). iText uses JAXP. JAXP is included in the Java 2 Enterprise Edition, so you will need the j2ee.jar to run these examples.

In the first example: Chap0701.java, we see how a document is generated containing almost all the objects we discussed in the previous chapters. The result is an XML file (Chap0701.xml) that begins like this:
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE ITEXT SYSTEM "http://itext.sourceforge.net/itext.dtd">
<itext creationdate="Fri Aug 17 13:15:42 CEST 2001" producer="iTextXML by lowagie.com">

The rest of the file contains the contents of the document. We recognize a lot of objects and their attributes from the previous chapters. All those objects are contained in the iText DTD. You can find this DTD here: http://itext.sourceforge.net/itext.dtd.

XML to PDF
We are not that interested in generating XML. What we really want, is to parse an XML file and generate the corresponding PDF (or HTML). That is what we are going to do in example 2. In Chapter 1, we described 5 steps to generate a PDF file; to convert an XML file into a PDF file we have to replace step 3 and 4; step 5 is performed automatically by the parser:

// step 3: we create a parser and set the document handler
SAXParser parser = SAXParserFactory.newInstance().newSAXParser();

// step 4: we parse the document
parser.parse("Chap0701.xml", new SAXiTextHandler(document));

These are the results: PDF/HTML.


Custom Tags
An XML to PDF convertor that only works with 1 specific DTD isn't very useful. The iText DTD isn't an official standard, but it evolved from the iText library. What we really need, is a means to convert any XML file to PDF. To achieve this we will have to map custom tags (e.g. from a standard DTD) to iText tags. There are several ways to do this. In example 3 we are going to use a tagmap to convert a work by Shakespeare in XML format into a PDF file.
Chap0706.xml is the XML file (containing 'The Tragedy of Romeo and Juliet') we are going to parse. When taking a look at this XML file we recognize some customs tags such as SPEAKER, SPEECH, STAGEDIR,... In iText, these tags all correspond with the 'paragraph'-object so we have to map them all to the paragraph tag. But the different tags need to be visualized differently. We will let the name of the SPEAKER preceed by the word 'Speaker:' and we will print this in bold. Stage directions will be printed in italic, etc...
In the tagmap, we will define all these attributes like this:

<tag name="paragraph" alias="SPEAKER" content="Speaker:">
    <attribute name="leading" value="16" />
    <attribute name="size" value="10" />
    <attribute name="style" value="bold" />
</tag>
<tag name="paragraph" alias="SPEECH">
    <attribute name="leading" value="14" />
    <attribute name="align" value="Left" />
</tag>
<tag name="paragraph" alias="STAGEDIR">
    <attribute name="leading" value="14" />
    <attribute name="size" value="10" />
    <attribute name="style" value="italic" />
    <attribute name="align" value="Right" />
</tag>

To generate the PDF file, we only have to make a small change in step 3:

// step 3: we create a parser and set the document handler
SAXParser parser = SAXParserFactory.newInstance().newSAXParser();

// step 4: we parse the document
parser.parse("Chap0703.xml", new SAXmyHandler(document, new TagMap("tagmap0703.xml")));

Instead of the SAXiTextHandler, we use a SAXmyHandler and a TagMap (this is a HashMap that is automatically filled with XmlPeer-object based on your own custom tagmap.xml). Check out the resulting PDF!


Making life easy
Forget the previous sections. I have let your read them only for pedagogical reasons ;-)
In fact, you don't need step 3 and 4, you can just do it in one step, as shown in example 4. We reuse the xml-files we converted in example 2 and 3 and we generate them like this: // step 2:
// we create a writer that listens to the document
// and directs a XML-stream to a file
PdfWriter.getInstance(documentA, new FileOutputStream("Chap0704a.pdf"));
PdfWriter.getInstance(documentB, new FileOutputStream("Chap0704b.pdf"));

// step 3: we parse the document
XmlParser.parse(documentA, "Chap0701.xml");
XmlParser.parse(documentB, "Chap0703.xml", "tagmap0703.xml");

It's easy, isn't it?


Data Merging
Uptil now, we have converted complete documents to PDF. But what if the document is only a template with some tags that have to be replaced by some values from a database? In this case, using a tagmap.xml isn't sufficient.
Suppose we have a database containing some names, email-adresses and URLs (see simpleDB0705.txt) and an XML template of a letter: simpleLetter0705.xml. In this XML template, we can use iText tags (such as <newline />), tags we can map to iText tags (<letter> is mapped to <itext>), but also some tags we are going to use for data merging: <mail />, <givenname />, <name /> and <website />.
There are 5 records in our simple database. Example 5 shows how we can use this database to generate 5 different document based on the XML template. For each letter, we are going to make a new HashMap that will operate as tagmap:

HashMap tagmap = new HashMap();
StringTokenizer tokenizer = new StringTokenizer(line, "|");

XmlPeer peer = new XmlPeer(ElementTags.ITEXT, "letter");
tagmap.put(peer.getAlias(), peer);
if (tokenizer.hasMoreTokens()) {
    peer = new XmlPeer(ElementTags.CHUNK, "givenname");
    peer.setContent(tokenizer.nextToken());
    tagmap.put(peer.getAlias(), peer);
}
if (tokenizer.hasMoreTokens()) {
    peer = new XmlPeer(ElementTags.CHUNK, "name");
    peer.setContent(tokenizer.nextToken());
    tagmap.put(peer.getAlias(), peer);
}
if (tokenizer.hasMoreTokens()) {
    peer = new XmlPeer(ElementTags.CHUNK, "mail");
    peer.setContent(tokenizer.nextToken());
    tagmap.put(peer.getAlias(), peer);
}
if (tokenizer.hasMoreTokens()) {
    peer = new XmlPeer(ElementTags.ANCHOR, "website");
    String reference = tokenizer.nextToken();
    peer.setContent(reference);
    peer.addValue(ElementTags.REFERENCE, reference);
    peer.addValue(ElementTags.COLOR, "#0000FF");
    tagmap.put(peer.getAlias(), peer);
}

We see that the three first fields are mapped to a <chunk> and that the last field (an URL) is used as an anchor!
We parse the XML template 5 times (once per tagmap) and this results in letter 1, letter 2, letter 3, letter 4 and letter 5.
In a 'real life' application, you probably will have to use your own tagmap class (derived from class HashMap or TagMap) class in combination with you own handler class (derived from SAXmyHandler). You can do some really neat stuff with this!


Parsing (X)HTML
Once we can parse XML, why couldn't we parse (X)HTML? One could create a Html2iText tagmap, or even a special HtmlHandler. Example 6 shows how we can convert the HTML page generated in example 2 to PDF, using the class SAXmyHtmlHandler. This results in a PDF file that is similar to the result of example 2: Chap0706.pdf. In example 7, we use this HTML handler behind the screens (just like in the section 'making life easier'). Unfortunately, this part of iText is not yet completed. A lot of tags aren't supported yet. There is a lot of room for improvement!

Generating a HTML file with a CSS (special thanks to Matt Benson)
In example 8, we generate an HTML file that uses a CSS file. We add the CSS file like this:

document.add(new Header(HtmlTags.STYLESHEET, "myStyles.css"));

If you look at the file myStyles.css, you can see we defined not only some tags (like body, a,...), but also some classes (small, red, blue, gray). We can now use these classes, by setting the markup-attributes of the objects we are using:

document.add(new Header(HtmlTags.STYLESHEET, "myStyles.css"));

listItem = new ListItem("Isaac Asimov");
listItem.setMarkupAttribute(ElementTags.CLASS, "small");
Paragraph cTitle = new Paragraph("This is chapter " + i);
cTitle.setMarkupAttribute(ElementTags.CLASS, "red");

Take a look at the result to see the effect of the style sheet: Chap0708.html.


[Top] [Previous] [TOC] [Next] [PDF]
Page Updated: $Date: 2004/02/07 10:48:07 $
Copyright © 2000, 2001 by Bruno Lowagie
Adolf Baeyensstraat 121, 9040 Gent, BELGIUM,
tel +00 32 92 28 10 97 mailto:itext-questions@lists.sourceforge.net