gnu.xml.util
public class XMLWriter extends Object implements ContentHandler, LexicalHandler, DTDHandler, DeclHandler
By default, text is generated "as-is", but some optional modes are supported. Pretty-printing is supported, to make life easier for people reading the output. XHTML (1.0) output has can be made particularly pretty; all the built-in character entities are known. Canonical XML can also be generated, assuming the input is properly formed.
Some of the methods on this class are intended for applications to use directly, rather than as pure SAX2 event callbacks. Some of those methods access the JavaBeans properties (used to tweak output formats, for example canonicalization and pretty printing). Subclasses are expected to add new behaviors, not to modify current behavior, so many such methods are final.
The write*() methods may be slightly simpler for some applications to use than direct callbacks. For example, they support a simple policy for encoding data items as the content of a single element.
To reuse an XMLWriter you must provide it with a new Writer, since this handler closes the writer it was given as part of its endDocument() handling. (XML documents have an end of input, and the way to encode that on a stream is to close it.)
Note that any relative URIs in the source document, as found in entity and notation declarations, ought to have been fully resolved by the parser providing events to this handler. This means that the output text should only have fully resolved URIs, which may not be the desired behavior in cases where later binding is desired.
Note that due to SAX2 defaults, you may need to manually ensure that the input events are XML-conformant with respect to namespace prefixes and declarations. NSFilter is one solution to this problem, in the context of processing pipelines. Something as simple as connecting this handler to a parser might not generate the correct output. Another workaround is to ensure that the namespace-prefixes feature is always set to true, if you're hooking this directly up to some XMLReader implementation.
Version: $Date: 2001/11/20 01:15:45 $
Constructor Summary | |
---|---|
XMLWriter()
Constructs this handler with System.out used to write SAX events
using the UTF-8 encoding. | |
XMLWriter(OutputStream out)
Constructs a handler which writes all input to the output stream
in the UTF-8 encoding, and closes it when endDocument is called.
| |
XMLWriter(Writer writer)
Constructs a handler which writes all input to the writer, and then
closes the writer when the document ends. | |
XMLWriter(Writer writer, String encoding)
Constructs a handler which writes all input to the writer, and then
closes the writer when the document ends. |
Method Summary | |
---|---|
void | attributeDecl(String eName, String aName, String type, String mode, String value) SAX2: called on attribute declarations |
void | characters(char[] ch, int start, int length) SAX1: reports content characters |
void | comment(char[] ch, int start, int length)
SAX2: called when comments are parsed.
|
void | elementDecl(String name, String model) SAX2: called on element declarations |
void | endCDATA() SAX2: called after parsing CDATA characters |
void | endDocument()
SAX1: indicates the completion of a parse.
|
void | endDTD() SAX2: called after the doctype is parsed |
void | endElement(String uri, String localName, String qName) SAX2: indicates the end of an element |
void | endEntity(String name)
SAX2: called after parsing a general entity in content |
void | endPrefixMapping(String prefix)
SAX2: ignored. |
void | externalEntityDecl(String name, String publicId, String systemId) SAX2: called on external entity declarations |
protected void | fatal(String message, Exception e)
Used internally and by subclasses, this encapsulates the logic
involved in reporting fatal errors. |
void | flush()
Flushes the output stream. |
void | ignorableWhitespace(char[] ch, int start, int length) SAX1: reports ignorable whitespace |
void | internalEntityDecl(String name, String value) SAX2: called on internal entity declarations |
boolean | isCanonical()
Returns value of flag controlling canonical output. |
boolean | isExpandingEntities()
Returns true if the output will have no entity references;
returns false (the default) otherwise. |
boolean | isPrettyPrinting()
Returns value of flag controlling pretty printing. |
boolean | isXhtml()
Returns true if the output attempts to echo the input following
"transitional" XHTML rules and matching the "HTML Compatibility
Guidelines" so that an HTML version 3 browser can read the output
as HTML; returns false (the default) othewise. |
void | notationDecl(String name, String publicId, String systemId) SAX1: called on notation declarations |
void | processingInstruction(String target, String data)
SAX1: reports a PI.
|
void | setCanonical(boolean value)
Sets the output style to be canonicalized. |
void | setDocumentLocator(Locator l) SAX1: provides parser status information |
void | setEOL(String eolString)
Assigns the line ending style to be used on output. |
void | setErrorHandler(ErrorHandler handler)
Assigns the error handler to be used to present most fatal
errors. |
void | setExpandingEntities(boolean value)
Controls whether the output text contains references to
entities (the default), or instead contains the expanded
values of those entities. |
void | setPrettyPrinting(boolean value)
Controls pretty-printing, which by default is not enabled
(and currently is most useful for XHTML output).
|
void | setWriter(Writer writer, String encoding)
Resets the handler to write a new text document.
|
void | setXhtml(boolean value)
Controls whether the output should attempt to follow the "transitional"
XHTML rules so that it meets the "HTML Compatibility Guidelines"
appendix in the XHTML specification. |
void | skippedEntity(String name) SAX1: indicates a non-expanded entity reference |
void | startCDATA() SAX2: called before parsing CDATA characters |
void | startDocument()
SAX1: indicates the beginning of a document parse.
|
void | startDTD(String name, String publicId, String systemId)
SAX2: called when the doctype is partially parsed
Note that this, like other doctype related calls, is ignored
when XHTML is in use. |
void | startElement(String uri, String localName, String qName, Attributes atts)
SAX2: indicates the start of an element.
|
void | startEntity(String name)
SAX2: called before parsing a general entity in content |
void | startPrefixMapping(String prefix, String uri)
SAX2: ignored. |
void | unparsedEntityDecl(String name, String publicId, String systemId, String notationName) SAX1: called on unparsed entity declarations |
void | write(String data)
Writes the string as if characters() had been called on the contents
of the string. |
void | writeElement(String uri, String localName, String qName, Attributes atts, String content)
Writes an element that has content consisting of a single string. |
void | writeElement(String uri, String localName, String qName, Attributes atts, int content)
Writes an element that has content consisting of a single integer,
encoded as a decimal string. |
void | writeEmptyElement(String uri, String localName, String qName, Attributes atts)
Writes an empty element. |
See the description of the constructor which takes an encoding name for imporant information about selection of encodings.
Parameters: writer XML text is written to this writer.
At this time, only the UTF-8 ("UTF8") and UTF-16 ("Unicode") output encodings are fully lossless with respect to XML data. If you use any other encoding you risk having your data be silently mangled on output, as the standard Java character encoding subsystem silently maps non-encodable characters to a question mark ("?") and will not report such errors to applications.
For a few other encodings the risk can be reduced. If the writer is a java.io.OutputStreamWriter, and uses either the ISO-8859-1 ("8859_1", "ISO8859_1", etc) or US-ASCII ("ASCII") encodings, content which can't be encoded in those encodings will be written safely. Where relevant, the XHTML entity names will be used; otherwise, numeric character references will be emitted.
However, there remain a number of cases where substituting such entity or character references is not an option. Such references are not usable within a DTD, comment, PI, or CDATA section. Neither may they be used when element, attribute, entity, or notation names have the problematic characters.
Parameters: writer XML text is written to this writer. encoding if non-null, and an XML declaration is written, this is the name that will be used for the character encoding.
Note that fragments of XML documents, as specified by an XPath node set, may be canonicalized. In such cases, elements may need some fixup (for xml:* attributes and application-specific context).
Throws: IllegalArgumentException if the output encoding is anything other than UTF-8.
Parameters: eolString null to use the system default; else "\n", "\r", or "\r\n".
At this writing, structural indentation and line wrapping are enabled when pretty printing is enabled and the xml:space attribute has the value default (its other legal value is preserve, as defined in the XML specification). The three XHTML element types which use another value are recognized by their names (namespaces are ignored).
Also, for the record, the "pretty" aspect of printing here is more to provide basic structure on outputs that would otherwise risk being a single long line of text. For now, expect the structure to be ragged ... unless you'd like to submit a patch to make this be more strictly formatted!
Throws: IllegalStateException thrown if this method is invoked after output has begun.
Parameters: writer XML text is written to this writer. encoding if non-null, and an XML declaration is written, this is the name that will be used for the character encoding.
Throws: IllegalStateException if the current document hasn't yet ended (with XMLWriter)
When this option is enabled, it is the caller's responsibility to ensure that the input is otherwise valid as XHTML. Things to be careful of in all cases, as described in the appendix referenced above, include:
Additionally, some of the oldest browsers have additional quirks, to address with guidelines such as:
Also, some characteristics of the resulting output may be a function of whether the document is later given a MIME content type of text/html rather than one indicating XML (application/xml or text/xml). Worse, some browsers ignore MIME content types and prefer to rely URI name suffixes -- so an "index.xml" could always be XML, never XHTML, no matter its MIME type.
See Also: XMLWriter
Source code is under GPL (with library exception) in the JAXP project at http://www.gnu.org/software/classpathx/jaxp
This documentation was derived from that source code on 2013-01-12.