iText Tutorial
|
|
iText, a Free Java-PDF library
by Bruno Lowagie
|
[Home] |
[Previous] |
[TOC] |
|
[PDF] |
Part IV: Simple iText
Chapter 13: Reading PDF
|
PdfReader
You can't 'parse' an existing PDF file using iText, you can only 'read' it page per page.
What does this mean?
The pdf format is just a canvas where text and graphics are placed without
any structure information. As such there aren't any 'iText-objects' in a PDF file.
In each page there will probably be a number of 'Strings', but you can't reconstruct
a phrase or a paragraph using these strings. There are probably a number of lines drawn,
but you can't retrieve a Table-object based on these lines. In short: parsing the content
of a PDF-file is NOT POSSIBLE with iText (not if you want good results: there are ways
to retrieve text from an existing PDF). Post your question on the newsgroup
news://comp.text.pdf and maybe you will get some
answers from people that have built tools that can parse PDF and extract some of its contents,
but don't expect tools that will perform a bullet-proof conversion to structured text.
What iText DOES provide is the possibility to READ a PDF document and copy an entire
page of this file into the PDF file you are constructing from scratch. This can be useful
if you want to create a new document based on (an) existing document(s). You can add
a Watermark, pagenumbers,...
Chap13_pdfreader takes a pdf
file from Chapter 7 and creates a new document where 4 pages of the original document
are painted on 1 page of the new document. We also added a Watermark and pagenumbers
(see Chap13_pdfreader.pdf).
In order to fully understand the code (an how to adapt it to your needs, you will
have to read Chapter 10 first)
If you have an existing PDF file that represents a form, you could copy the pages
of this form and paint text at precise locations on this form. You can't edit an
existing PDF document, by saying: for instance replace the word Louagie by Lowagie.
To achieve this, you would have to know the exact location of the word Louagie,
paint a white rectangle over it and paint the word Lowagie on this white rectangle.
Please avoid this kind of 'patch' work. Do your PDF editing with an Adobe product.
com.lowagie.tools.*
In package com.lowagie.tools, there are 4 little tools that can be called from the command line:
- com.lowagie.tools.concat_pdf
This class can be used from the commandline to concatenate existing PDF files.
arguments: the filenames of the PDF documents you want to concatenate, followed by the filename of the destination file.
Command line example:
java -cp itext.jar com.lowagie.tools.concat_pdf Chap0101.pdf Chap0102.pdf Chap0103.pdf result.pdf
result.pdf contains the three first examples from Chapter 1.
- com.lowagie.tools.split_pdf
This class can be used from the commandline to split an existing PDF file into two new files.
Remark: some information from the original file (for instance annotations) will get lost in the process!
arguments: srcfile destfile1 destfile2 pagenumber
Command line example:
java -cp itext.jar com.lowagie.tools.split_pdf result.pdf result1.pdf result2.pdf 2
result.pdf will be split into a one page document result1.pdf and a 2 page document result2.pdf
(result2.pdf starts with the second page of result.pdf).
- com.lowagie.tools.handout_pdf
This class can be used from the commandline to make handouts from an existing PDF file. You can choose the number of slides per page.
arguments: srcfile destfile pages
Command line example:
java -cp itext.jar com.lowagie.tools.handout_pdf concat.pdf handout.pdf 2
handout.pdf is a two page overview (2 slides per page) of the three page document result.pdf.
- com.lowagie.tools.encrypt_pdf
This class can be used from the commandline to encrypt a PDF file.
arguments: input_file output_file user_password owner_password permissions 128|40
permissions is 8 digit long 0 or 1. Each digit has a particular security function:
- AllowPrinting
- AllowModifyContents
- AllowCopy
- AllowModifyAnnotations
- AllowFillIn (128 bit only)
- AllowScreenReaders (128 bit only)
- AllowAssembly (128 bit only)
- AllowDegradedPrinting (128 bit only)
Example permissions to copy and print would be: 10100000
Command line example:
java -cp itext.jar com.lowagie.tools.encrypt_pdf Chap0101.pdf encrypted.pdf user master 00000000 128
You will only be able to open the file encrypted.pdf
if you know the password (= user). You won't be able to print the file, modify the contents, copy parts of it,...
|
[Top] |
|
[TOC] |
[Next] |
[PDF] |
Page Updated: $Date: 2003/06/25 07:36:35 $
Copyright © 2000, 2001 by Bruno Lowagie
|
Adolf Baeyensstraat 121, 9040 Gent, BELGIUM,
tel +00 32 92 28 10 97 mailto:itext-questions@lists.sourceforge.net
|
|