Tutorial: iText by Example

Manipulating existing PDF documents

UNDER CONSTRUCTION
Introduction:
One of the most frequently asked questions on the mailing-list sounds like this: I have an existing PDF file and I want to change every occurence of the word 'competitor' with the word 'myself'. Or: I have an existing PDF file and I want to underline every occurence of the word 'FREE'. If this is what you need, you can stop reading. This just isn't possible with iText. iText can't convert a text in PDF to some other 'readable' document format such as RTF, WORD or even plain text. You can't ask iText for the occurrance of a certain word in a document or its position on a certain page.
This being said, let's see what iText can do. Depending on what you want to achieve, you are going to use one of the following objects to manipulate existing PDF documents:
  • PdfWriter: generates a document from scratch, but also supports importing pages from other PDF documents. There's one big downside: all interactive features of the PdfImportedPage are lost in the process (annotations, bookmarks, fields,...) if you use PdfWriter.
  • PdfStamper: can manipulate the content of one (1!) existing PDF document. For instance: add pagenumbers and a watermark, fill and/or flatten formfields, sign or encrypt an existing PDF document,...
  • PdfEncryptor: uses PdfStamper to offer some userfriendly methods for encrypting an existing document.
  • PdfCopy: can concatenate (a selection of) pages from (one or) multiple existing PDF forms. Can adjust bookmarks, but possible loss of formfields. PdfCopy doesn't allow new content!
  • PdfCopyFields: concatenates multiple PDF documents, keeping the fields (puts the fields of the different sourcefiles in one Acroform).
Before we start with the examples, always remember that the outputfile must have a different name than the input file!!! If you open a file with PdfReader, you can't write to it with PdfWriter/PdfStamper/PdfCopy! If you want to 'edit' an existing file, you need to use a temporary file and rename it afterwards.
Go to top of the page
PdfWriter:
If you want to create a document from scratch, you have to follow the 5 steps as described in the Hello World chapter. In step 2, you create a PdfWriter object to which new content will be written. This new content can consist of high-level objects added to a Document-object or it can be direct content added to a PdfContentByte-object. PdfImportedPage is an object that can be added as direct content. You get PdfImportedPage objects from PdfReader:
PdfReader reader = new PdfReader("existing.pdf");
PdfImportedPage page1 = writer.getImportedPage(reader, 1);
directcontent.addTemplate(page1, 1, 0, 0, 1, 0, 0);
Remark that you can't change the content of a PdfImportedPage-object. If you want to write extra content over or under the page, you need to ask the PdfWriter object for a PdfContentByte object. This is explained in the tutorial part on direct content, so is the meaning of the parameters of the addTemplate method. In the example below, imported pages are scaled so that two pages in portrait fit on one page in landscape.
Example: java com.lowagie.examples.general.copystamp.TwoOnOne
Combines 2 pages on 1: see 2on1.pdf
External resources for this example: ChapterSection.pdf
Go to top of the page
PdfStamper:
If you want to change the contents of one existing PDF file and add extra content such as watermarks, pagenumbers, extra headers,... PdfStamper is the object you need.
PdfReader reader = new PdfReader("existing.pdf");
PdfStamper stamp = new PdfStamper(reader,
  new FileOutputStream("stamped.pdf"));
PdfContentByte under = stamp.getUnderContent(1);
// change the content beneath page 1
PdfContentByte over = stamp.getOverContent(1);
// change the content on top of page 1
stamp.close()
Again you will need to read the chapters on direct content in order to know how to add text, images, etc... In the example below, we add a watermark under the existing text and the word 'DUPLICATE' on top of the text. We also add an extra title page.
Example: java com.lowagie.examples.general.copystamp.AddWatermarkPageNumbers
Adds pagenumbers and a watermark to an existing document: see watermark_pagenumbers.pdf
External resources for this example: ChapterSection.pdf SimpleAnnotations1.pdf watermark.jpg
PdfStamper is also the class you need if you want to fill in AcroForms. Below, you'll find an example that fills in a simple form, once with and once without formflattening. More examples will follow in the chapter on formfilling.
Example: java com.lowagie.examples.general.copystamp.Register
Fills in a form: see registered.pdf registered_flat.pdf
External resources for this example: SimpleRegistrationForm.pdf
If you need to encrypt an existing document, you can do this with PdfStamper too, but there is a helper class called PdfEncryptor that offers some methods that are more userfriendly:
PdfEncryptor.encrypt(reader,
  new FileOutputStream("encrypted.pdf"),
  "Hello".getBytes(),
  "World".getBytes(),
  PdfWriter.AllowPrinting | PdfWriter.AllowCopy,
  false)
In the example 'Hello' is set as the userpassword, you'll need it to open the document with Acrobat Reader. 'World' is the ownerpassword, you'll need it to read it with an application (for instance with PdfReader). The 4 parameter defines the permissions. This is done by or-ing the following integers:
  • PdfWriter.AllowPrinting
  • PdfWriter.AllowModifyContents
  • PdfWriter.AllowCopy
  • PdfWriter.AllowModifyAnnotations
  • PdfWriter.AllowFillIn (128 bit only)
  • PdfWriter.AllowScreenReaders (128 bit only)
  • PdfWriter.AllowAssembly (128 bit only)
  • PdfWriter.AllowDegradedPrinting (128 bit only)
The boolean lets you choose between 40 bit encryption (false) or 128 bit encryption (true).
Example: java com.lowagie.examples.general.copystamp.EncryptorExample
Encrypts an existing PDF file: see encrypted.pdf
External resources for this example: ChapterSection.pdf
Go to top of the page
PdfCopy:
Allthough you could work around the PdfStamper's one document only limitation by adding new pages, you'd better use PdfCopy if you only want to copy (a selection of) pages from one or more PDF files into one resulting PDF. And if you don't need to change the contents of one or more PDF files (maybe just the bookmarks). The syntax of PdfCopy looks very similar to the syntax of PdfWriter:
PdfReader reader = new PdfReader("existing.pdf");
Document document = new Document(reader.getPageSizeWithRotation(1));
PdfCopy copy = new PdfCopy(document, new FileOutputStream(outFile));
document.open();
PdfImportedPage page = copy.getImportedPage(reader, i);
copy.addPage(page);
document.close();
The example can be used as a commandline tool to concatenate existing PDF files.
Example: java com.lowagie.examples.general.copystamp.Concatenate ChapterSection.pdf Destinations.pdf SimpleAnnotations1.pdf concatenated.pdf
Concatenates existing PDF files: see concatenated.pdf
External resources for this example: ChapterSection.pdf Destinations.pdf SimpleAnnotations1.pdf
There's one big problem with PdfCopy: it doesn't work well with files containing AcroForms. If you want to concatenate existing PDF files with different AcroForms, you must use the PdfCopyFields class. All the documents are kept in memory unlike PdfCopy. The rules for the form field concatenation are the same as in Acrobat. Note that if you have fields with the same name they will me merged so, it's probably a good idea to rename them if that's the case.
Example: java com.lowagie.examples.general.copystamp.ConcatenateForms
Concatenates existing PDF files with forms: see concatenatedforms.pdf
External resources for this example: SimpleRegistrationForm.pdf TextFields.pdf ChapterSection.pdf
Go to top of the page



Amazon books:
amazon.co.uk-link

amazon.co.uk-link