Class Serializer
- Object
-
- nu.xom.Serializer
-
public class Serializer extends Object
Outputs a
Document
object in a specific encoding using various options for controlling white space, normalization, indenting, line breaking, and base URIs. However, in general these options do affect the document's infoset. In particular, if you set either the maximum line length or the indent size to a positive value, then the serializer will not respect input white space. It may trim leading and trailing space, condense runs of white space to a single space, convert carriage returns and linefeeds to spaces, add extra space where none was present before, and otherwise muck with the document's white space. The defaults, however, preserve all significant white space including ignorable white space and boundary white space.- Version:
- 1.2d1
- Author:
- Elliotte Rusty Harold
-
-
Constructor Summary
Constructors Constructor Description Serializer(OutputStream out)
Create a new serializer that uses the UTF-8 encoding.Serializer(OutputStream out, String encoding)
Create a new serializer that uses the specified encoding.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
breakLine()
Writes the current line break string onto the underlying output stream and indents as specified by the current level and the indent property.void
flush()
Flushes the data onto the output stream.protected int
getColumnNumber()
Returns the current column number of the output stream.String
getEncoding()
Returns the name of the character encoding used by this serializer.int
getIndent()
Returns the number of spaces this serializer indents.String
getLineSeparator()
Returns the string used as a line separator.int
getMaxLength()
Returns the preferred maximum line length.boolean
getPreserveBaseURI()
Returns true if this serializer preserves the original base URIs by inserting extraxml:base
attributes.boolean
getUnicodeNormalizationFormC()
Indicates whether serialization will perform Unicode normalization on all data using normalization form C (NFC).void
setIndent(int indent)
Sets the number of additional spaces to add to each successive level in the hierarchy.void
setLineSeparator(String lineSeparator)
Sets the line separator.void
setMaxLength(int maxLength)
Sets the suggested maximum line length for this serializer.void
setOutputStream(OutputStream out)
Flushes the previous output stream and redirects further output to the new output stream.void
setPreserveBaseURI(boolean preserve)
Determines whether this serializer inserts extraxml:base
attributes to attempt to preserve base URI information from the document.void
setUnicodeNormalizationFormC(boolean normalize)
If true, this property indicates serialization will perform Unicode normalization on all data using normalization form C (NFC).protected void
write(Attribute attribute)
Writes an attribute in the formname="value"
.protected void
write(Comment comment)
Writes a comment onto the output stream using the current options.protected void
write(DocType doctype)
Writes aDocType
object onto the output stream using the current options.void
write(Document doc)
Serializes a document onto the output stream using the current options.protected void
write(Element element)
Serializes an element onto the output stream using the current options.protected void
write(ProcessingInstruction instruction)
Writes a processing instruction onto the output stream using the current options.protected void
write(Text text)
Writes aText
object onto the output stream using the current options.protected void
writeAttributes(Element element)
Writes all the attributes of the specified element onto the output stream, one at a time, separated by white space.protected void
writeAttributeValue(String value)
Writes a string onto the underlying output stream.protected void
writeChild(Node node)
Writes a child node onto the output stream using the current options.protected void
writeEmptyElementTag(Element element)
Writes an empty-element tag for the element including all its namespace declarations and attributes.protected void
writeEndTag(Element element)
Writes the end-tag for an element in the form</name>
.protected void
writeEscaped(String text)
Writes a string onto the underlying output stream.protected void
writeNamespaceDeclaration(String prefix, String uri)
Writes a namespace declaration in the formxmlns:prefix="uri"
orxmlns="uri"
.protected void
writeNamespaceDeclarations(Element element)
Writes all the namespace declaration attributes of the specified element onto the output stream, one at a time, separated by white space.protected void
writeRaw(String text)
Writes a string onto the underlying output stream.protected void
writeStartTag(Element element)
Writes the start-tag for the element including all its namespace declarations and attributes.protected void
writeXMLDeclaration()
Writes the XML declaration onto the output stream, followed by a line break.
-
-
-
Constructor Detail
-
Serializer
public Serializer(OutputStream out)
Create a new serializer that uses the UTF-8 encoding.
- Parameters:
out
- the output stream to write the document on- Throws:
NullPointerException
- ifout
is null
-
Serializer
public Serializer(OutputStream out, String encoding) throws UnsupportedEncodingException
Create a new serializer that uses the specified encoding. The encoding must be recognized by the Java virtual machine. If you attempt to use an encoding that the local Java virtual machine does not support, the constructor will throw an
UnsupportedEncodingException
. Currently the following encodings are recognized by XOM:- UTF-8
- UTF-16
- UTF-16BE
- UTF-16LE
- ISO-10646-UCS-2
- ISO-8859-1
- ISO-8859-2
- ISO-8859-3
- ISO-8859-4
- ISO-8859-5
- ISO-8859-6
- ISO-8859-7
- ISO-8859-8
- ISO-8859-9
- ISO-8859-10
- ISO-8859-11 (a.k.a. TIS-620)
- ISO-8859-13
- ISO-8859-14
- ISO-8859-15
- ISO-8859-16
- IBM037 (a.k.a. CP037, EBCDIC-CP-US, EBCDIC-CP-CA, EBCDIC-CP-WA, EBCDIC-CP-NL, and CSIBM037)
- GB18030
You can use encodings not in this list if the virtual machine supports them. However, they may be significantly slower than the encodings in this list.
I've noticed Java has significant bugs in its handling of some of these encodings. In some cases such as 0x80 in Big5, XOM will escape a character that should not need to be escaped because Java can't output that character in the specified encoding, even though the output character set does contain it. :-(
- Parameters:
out
- the output stream to write the document onencoding
- the character encoding for the serialization- Throws:
NullPointerException
- ifout
orencoding
is nullUnsupportedEncodingException
- if the VM does not support the requested encoding
-
-
Method Detail
-
setOutputStream
public void setOutputStream(OutputStream out) throws IOException
Flushes the previous output stream and redirects further output to the new output stream.
- Parameters:
out
- the output stream to write the document on- Throws:
NullPointerException
- ifout
is nullIOException
- if the previous output stream encounters an I/O error when flushed
-
write
public void write(Document doc) throws IOException
Serializes a document onto the output stream using the current options.
- Parameters:
doc
- theDocument
to serialize- Throws:
IOException
- if the underlying output stream encounters an I/O errorNullPointerException
- ifdoc
is nullUnavailableCharacterException
- if the document contains an unescapable character (e.g. in an element name) that is not available in the current encoding
-
writeXMLDeclaration
protected void writeXMLDeclaration() throws IOException
Writes the XML declaration onto the output stream, followed by a line break.
- Throws:
IOException
- if the underlying output stream encounters an I/O error
-
write
protected void write(Element element) throws IOException
Serializes an element onto the output stream using the current options. The result is guaranteed to be well-formed.
If the element is empty, this method invokes
writeEmptyElementTag
. If the element is not empty, then:- It calls
writeStartTag
. - It passes each of the element's children to
writeChild
in order. - It calls
writeEndTag
.
It may break lines or add white space if the serializer has been configured to indent or use a maximum line length.
- Parameters:
element
- theElement
to serialize- Throws:
IOException
- if the underlying output stream encounters an I/O errorUnavailableCharacterException
- if the element name contains a character that is not available in the current encoding
- It calls
-
writeEndTag
protected void writeEndTag(Element element) throws IOException
Writes the end-tag for an element in the form
</name>
.- Parameters:
element
- the element whose end-tag is written- Throws:
IOException
- if the underlying output stream encounters an I/O error
-
writeStartTag
protected void writeStartTag(Element element) throws IOException
Writes the start-tag for the element including all its namespace declarations and attributes.
The
writeAttributes
method is called to write all the non-namespace-declaration attributes. ThewriteNamespaceDeclarations
method is called to write all the namespace declaration attributes.- Parameters:
element
- the element whose start-tag is written- Throws:
IOException
- if the underlying output stream encounters an I/O errorUnavailableCharacterException
- if the name of the element or the name of any of its attributes contains a character that is not available in the current encoding
-
writeEmptyElementTag
protected void writeEmptyElementTag(Element element) throws IOException
Writes an empty-element tag for the element including all its namespace declarations and attributes.
The
writeAttributes
method is called to write all the non-namespace-declaration attributes. ThewriteNamespaceDeclarations
method is called to write all the namespace declaration attributes.If subclasses don't wish empty-element tags to be used, they can override this method to simply invoke
writeStartTag
followed bywriteEndTag
.- Parameters:
element
- the element whose empty-element tag is written- Throws:
IOException
- if the underlying output stream encounters an I/O errorUnavailableCharacterException
- if the name of the element or the name of any of its attributes contains a character that is not available in the current encoding
-
writeAttributes
protected void writeAttributes(Element element) throws IOException
Writes all the attributes of the specified element onto the output stream, one at a time, separated by white space. If preserveBaseURI is true, and it is necessary to add an
xml:base
attribute to the element in order to preserve the base URI, then that attribute is also written here. Each individual attribute is written by invokingwrite(Attribute)
.- Parameters:
element
- theElement
whose attributes are written- Throws:
IOException
- if the underlying output stream encounters an I/O errorUnavailableCharacterException
- if the name of any of the element's attributes contains a character that is not available in the current encoding
-
writeNamespaceDeclarations
protected void writeNamespaceDeclarations(Element element) throws IOException
Writes all the namespace declaration attributes of the specified element onto the output stream, one at a time, separated by white space. Each individual declaration is written by invoking
writeNamespaceDeclaration
.- Parameters:
element
- theElement
whose namespace declarations are written- Throws:
IOException
- if the underlying output stream encounters an I/O errorUnavailableCharacterException
- if any of the element's namespace prefixes contains a character that is not available in the current encoding
-
writeNamespaceDeclaration
protected void writeNamespaceDeclaration(String prefix, String uri) throws IOException
Writes a namespace declaration in the form
xmlns:prefix="uri"
orxmlns="uri"
. It does not write the spaces on either side of the namespace declaration. These are written bywriteNamespaceDeclarations
.- Parameters:
prefix
- the namespace prefix; the empty string for the default namespaceuri
- the namespace URI- Throws:
IOException
- if the underlying output stream encounters an I/O errorUnavailableCharacterException
- if the namespace prefix contains a character that is not available in the current encoding
-
write
protected void write(Attribute attribute) throws IOException
Writes an attribute in the form
name="value"
. Characters in the attribute value are escaped as necessary.- Parameters:
attribute
- theAttribute
to write- Throws:
IOException
- if the underlying output stream encounters an I/O errorUnavailableCharacterException
- if the attribute name contains a character that is not available in the current encoding
-
write
protected void write(Comment comment) throws IOException
Writes a comment onto the output stream using the current options. Since character and entity references are not resolved in comments, comments can only be serialized when all characters they contain are available in the current encoding.
- Parameters:
comment
- theComment
to serialize- Throws:
IOException
- if the underlying output stream encounters an I/O errorUnavailableCharacterException
- if the comment contains a character that is not available in the current encoding
-
write
protected void write(ProcessingInstruction instruction) throws IOException
Writes a processing instruction onto the output stream using the current options. Since character and entity references are not resolved in processing instructions, processing instructions can only be serialized when all characters they contain are available in the current encoding.
- Parameters:
instruction
- theProcessingInstruction
to serialize- Throws:
IOException
- if the underlying output stream encounters an I/O errorUnavailableCharacterException
- if the comment contains a character that is not available in the current encoding
-
write
protected void write(Text text) throws IOException
Writes a
Text
object onto the output stream using the current options. Reserved characters such as <, > and " are escaped using the standard entity references such as<
,>
, and"
.Characters which cannot be encoded in the current character set (for example, Ω in ISO-8859-1) are encoded using character references.
- Parameters:
text
- theText
to serialize- Throws:
IOException
- if the underlying output stream encounters an I/O error
-
write
protected void write(DocType doctype) throws IOException
Writes a
DocType
object onto the output stream using the current options.- Parameters:
doctype
- the document type declaration to serialize- Throws:
IOException
- if the underlying output stream encounters an I/O errorUnavailableCharacterException
- if the document type declaration contains a character that is not available in the current encoding
-
writeChild
protected void writeChild(Node node) throws IOException
Writes a child node onto the output stream using the current options. It is invoked when walking the tree to serialize the entire document. It is not called, and indeed should not be called, for either the
Document
node or for attributes.- Parameters:
node
- theNode
to serialize- Throws:
IOException
- if the underlying output stream encounters an I/O errorXMLException
- if anAttribute
, aDocument
, orNamespace
is passed to this method
-
writeEscaped
protected final void writeEscaped(String text) throws IOException
Writes a string onto the underlying output stream. Non-ASCII characters that are not available in the current character set are encoded with numeric character references. The three reserved characters <, >, and & are escaped using the standard entity references
<
,>
, and&
. Double and single quotes are not escaped.- Parameters:
text
- the parsed character data to serialize- Throws:
IOException
- if the underlying output stream encounters an I/O error
-
writeAttributeValue
protected final void writeAttributeValue(String value) throws IOException
Writes a string onto the underlying output stream. Non-ASCII characters that are not available in the current character set are escaped using hexadecimal numeric character references. Carriage returns, line feeds, and tabs are also escaped using hexadecimal numeric character references in order to ensure their preservation on a round trip. The four reserved characters <, >, &, and " are escaped using the standard entity references
<
,>
,&
, and"
. The single quote is not escaped.- Parameters:
value
- the attribute value to serialize- Throws:
IOException
- if the underlying output stream encounters an I/O error
-
writeRaw
protected final void writeRaw(String text) throws IOException
Writes a string onto the underlying output stream. without escaping any characters. Non-ASCII characters that are not available in the current character set cause an
IOException
.- Parameters:
text
- theString
to serialize- Throws:
IOException
- if the underlying output stream encounters an I/O error ortext
contains characters not available in the current character set
-
breakLine
protected final void breakLine() throws IOException
Writes the current line break string onto the underlying output stream and indents as specified by the current level and the indent property.
- Throws:
IOException
- if the underlying output stream encounters an I/O error
-
flush
public void flush() throws IOException
Flushes the data onto the output stream. It is not enough to flush the output stream. You must flush the serializer object itself because it uses some internal buffering. The serializer will flush the underlying output stream.
- Throws:
IOException
- if the underlying output stream encounters an I/O error
-
getIndent
public int getIndent()
Returns the number of spaces this serializer indents.
- Returns:
- the number of spaces this serializer indents each successive level beyond the previous one
-
setIndent
public void setIndent(int indent)
Sets the number of additional spaces to add to each successive level in the hierarchy. Use 0 for no extra indenting. The maximum indentation is in limited to approximately half the maximum line length. The serializer will not indent further than that no matter how many levels deep the hierarchy is.
When this variable is set to a value greater than 0, the serializer does not preserve white space. Spaces, tabs, carriage returns, and line feeds can all be interchanged at the serializer's discretion, and additional white space may be added before and after tags. Carriage returns, line feeds, and tabs will not be escaped with numeric character references.
Inside elements with an
xml:space="preserve"
attribute, white space is preserved and no indenting takes place, regardless of the setting of the indent property, unless, of course, anxml:space="default"
attribute overrides thexml:space="preserve"
attribute.The default value for indent is 0; that is, the default is not to add or subtract any white space from the source document.
- Parameters:
indent
- the number of spaces to indent each successive level of the hierarchy- Throws:
IllegalArgumentException
- if indent is less than zero
-
getLineSeparator
public String getLineSeparator()
Returns the string used as a line separator. This is always
"\n"
,"\r"
, or"\r\n"
.- Returns:
- the line separator
-
setLineSeparator
public void setLineSeparator(String lineSeparator)
Sets the line separator. This can only be one of the three strings
"\n"
,"\r"
, or"\r\n"
. All other values are forbidden. If this method is invoked, then line separators in the character data will be changed to this string. Line separators in attribute values will be changed to the hexadecimal numeric character references corresponding to this string.The default line separator is
"\r\n"
. However, line separators in character data and attribute values are not changed to this string, unless this method is called first.- Parameters:
lineSeparator
- the line separator to set- Throws:
IllegalArgumentException
- if you attempt to use any line separator other than"\n"
,"\r"
, or"\r\n"
.
-
getMaxLength
public int getMaxLength()
Returns the preferred maximum line length.
- Returns:
- the preferred maximum line length.
-
setMaxLength
public void setMaxLength(int maxLength)
Sets the suggested maximum line length for this serializer. Setting this to 0 indicates that no automatic wrapping is to be performed. When a line approaches this length, the serializer begins looking for opportunities to break the line. Generally it will break on any ASCII white space character (tab, carriage return, linefeed, and space). In some circumstances the serializer may not be able to break the line before the maximum length is reached. For instance, if an element name is longer than the maximum line length the only way to correctly serialize it is to exceed the maximum line length. In this case, the serializer will exceed the maximum line length.
The default value for maximum line length is 0, which is interpreted as no maximum line length. Setting this to a negative value just sets it to 0.
When this variable is set to a value greater than 0, the serializer does not preserve white space. Spaces, tabs, carriage returns, and line feeds can all be interchanged at the serializer's discretion. Carriage returns, line feeds, and tabs will not be escaped with numeric character references.
Inside elements with an
xml:space="preserve"
attribute, the maximum line length is not enforced, regardless of the setting of the this property, unless, of course, anxml:space="default"
attribute overrides thexml:space="preserve"
attribute.- Parameters:
maxLength
- the preferred maximum line length
-
getPreserveBaseURI
public boolean getPreserveBaseURI()
Returns true if this serializer preserves the original base URIs by inserting extra
xml:base
attributes.- Returns:
- true if this
Serializer
inserts extraxml:base
attributes to attempt to preserve base URI information from the document.
-
setPreserveBaseURI
public void setPreserveBaseURI(boolean preserve)
Determines whether this serializer inserts extra
xml:base
attributes to attempt to preserve base URI information from the document. The default is false, do not preserve base URI information.xml:base
attributes that have been explicitly added to an element are always output. This property only determines whether or not extraxml:base
attributes are added.- Parameters:
preserve
- true ifxml:base
attributes should be added as necessary to preserve base URI information
-
getEncoding
public String getEncoding()
Returns the name of the character encoding used by this serializer.
- Returns:
- the encoding used for the output document
-
setUnicodeNormalizationFormC
public void setUnicodeNormalizationFormC(boolean normalize)
If true, this property indicates serialization will perform Unicode normalization on all data using normalization form C (NFC). Performing Unicode normalization may change the document's infoset. The default is false; do not normalize. This version is based on Unicode 4.0.
This feature has not yet been benchmarked or optimized. It may result in substantially slower code.
If all your data is in the first 256 code points of Unicode (i.e. the ISO-8859-1, Latin-1 character set), then it's already in normalization form C and normalizing won't change anything.
- Parameters:
normalize
- true if normalization is performed; false if it isn't
-
getUnicodeNormalizationFormC
public boolean getUnicodeNormalizationFormC()
Indicates whether serialization will perform Unicode normalization on all data using normalization form C (NFC). The default is false; do not normalize.
- Returns:
- true if this serializer performs Unicode normalization; false if it doesn't
-
getColumnNumber
protected final int getColumnNumber()
Returns the current column number of the output stream. This method useful for subclasses that implement their own pretty printing strategies by inserting white space and line breaks at appropriate points.
Columns are counted based on Unicode characters, not Java chars. A surrogate pair counts as one character in this context, not two. However, a character followed by a combining character (e.g. e followed by combining accent acute) counts as two characters. This latter choice (treating combining characters like regular characters) is under review, and may change in the future if it's not too big a performance hit.
- Returns:
- the current column number
-
-