Represents OASIS Open Catalog files.
This class implements the semantics of OASIS Open Catalog files
(defined by
OASIS Technical
Resolution 9401:1997 (Amendment 2 to TR 9401)).
The primary purpose of the Catalog is to associate resources in the
document with local system identifiers. Some entities
(document types, XML entities, and notations) have names and all of them
can have either public or system identifiers or both. (In XML, only a
notation can have a public identifier without a system identifier, but
the methods implemented in this class obey the Catalog semantics
from the SGML
days when system identifiers were optional.)
The system identifiers returned by the resolution methods in this
class are valid, i.e. usable by, and in fact constructed by, the
java.net.URL class. Unfortunately, this class seems to behave in
somewhat non-standard ways and the system identifiers returned may
not be directly usable in a browser or filesystem context.
This class recognizes all of the Catalog entries defined in
TR9401:1997:
- BASE
changes the base URI for resolving relative system identifiers. The
initial base URI is the URI of the location of the catalog (which is,
in turn, relative to the location of the current working directory
at startup, as returned by the user.dir system property).
- CATALOG
processes other catalog files. An included catalog occurs logically
at the end of the including catalog.
- DELEGATE_PUBLIC
specifies alternate catalogs for some public identifiers. The delegated
catalogs are not loaded until they are needed, but they are cached
once loaded.
- DELEGATE_SYSTEM
specifies alternate catalogs for some system identifiers. The delegated
catalogs are not loaded until they are needed, but they are cached
once loaded.
- DELEGATE_URI
specifies alternate catalogs for some URIs. The delegated
catalogs are not loaded until they are needed, but they are cached
once loaded.
- REWRITE_SYSTEM
specifies alternate prefix for a system identifier.
- REWRITE_URI
specifies alternate prefix for a URI.
- SYSTEM_SUFFIX
maps any system identifier that ends with a particular suffix to another
system identifier.
- URI_SUFFIX
maps any URI that ends with a particular suffix to another URI.
- DOCTYPE
associates the names of root elements with URIs. (In other words, an XML
processor might infer the doctype of an XML document that does not include
a doctype declaration by looking for the DOCTYPE entry in the
catalog which matches the name of the root element of the document.)
- DOCUMENT
provides a default document.
- DTDDECL
recognized and silently ignored. Not relevant for XML.
- ENTITY
associates entity names with URIs.
- LINKTYPE
recognized and silently ignored. Not relevant for XML.
- NOTATION
associates notation names with URIs.
- OVERRIDE
changes the override behavior. Initial behavior is set by the
system property xml.catalog.override. The default initial
behavior is 'YES', that is, entries in the catalog override
system identifiers specified in the document.
- PUBLIC
maps a public identifier to a system identifier.
- SGMLDECL
recognized and silently ignored. Not relevant for XML.
- SYSTEM
maps a system identifier to another system identifier.
- URI
maps a URI to another URI.
Note that BASE entries are treated as described by RFC2396. In
particular, this has the counter-intuitive property that after a BASE
entry identifing "http://example.com/a/b/c" as the base URI,
the relative URI "foo" is resolved to the absolute URI
"http://example.com/a/b/foo". You must provide the trailing slash if
you do not want the final component of the path to be discarded as a
filename would in a URI for a resource: "http://example.com/a/b/c/".
Note that subordinate catalogs (all catalogs except the first,
including CATALOG and DELEGATE* catalogs) are only loaded if and when
they are required.
This class relies on classes which implement the CatalogReader
interface to actually load catalog files. This allows the catalog
semantics to be implemented for TR9401 text-based catalogs, XML
catalogs, or any number of other storage formats.
Additional catalogs may also be loaded with the
parseCatalog
method.
Change Log:
- xml.catalog.filesxml.catalog.files
unquoted-path-with-no-sep-chars:"double-quoted path with or without sep chars":'single-quoted path with or without sep chars'
BASE
public static final int BASE
The BASE Catalog Entry type.
CATALOG
public static final int CATALOG
The CATALOG Catalog Entry type.
DELEGATE_PUBLIC
public static final int DELEGATE_PUBLIC
The DELEGATE_PUBLIC Catalog Entry type.
DELEGATE_SYSTEM
public static final int DELEGATE_SYSTEM
The DELEGATE_SYSTEM Catalog Entry type.
DELEGATE_URI
public static final int DELEGATE_URI
The DELEGATE_URI Catalog Entry type.
DOCTYPE
public static final int DOCTYPE
The DOCTYPE Catalog Entry type.
DOCUMENT
public static final int DOCUMENT
The DOCUMENT Catalog Entry type.
DTDDECL
public static final int DTDDECL
The DTDDECL Catalog Entry type.
ENTITY
public static final int ENTITY
The ENTITY Catalog Entry type.
LINKTYPE
public static final int LINKTYPE
The LINKTYPE Catalog Entry type.
NOTATION
public static final int NOTATION
The NOTATION Catalog Entry type.
OVERRIDE
public static final int OVERRIDE
The OVERRIDE Catalog Entry type.
PUBLIC
public static final int PUBLIC
The PUBLIC Catalog Entry type.
REWRITE_SYSTEM
public static final int REWRITE_SYSTEM
The REWRITE_SYSTEM Catalog Entry type.
REWRITE_URI
public static final int REWRITE_URI
The REWRITE_URI Catalog Entry type.
SGMLDECL
public static final int SGMLDECL
The SGMLDECL Catalog Entry type.
SYSTEM
public static final int SYSTEM
The SYSTEM Catalog Entry type.
SYSTEM_SUFFIX
public static final int SYSTEM_SUFFIX
The SYSTEM_SUFFIX Catalog Entry type.
URI
public static final int URI
The URI Catalog Entry type.
URI_SUFFIX
public static final int URI_SUFFIX
The URI_SUFFIX Catalog Entry type.
base
protected URL base
The base URI for relative system identifiers in the catalog.
This may be changed by BASE entries in the catalog.
catalogCwd
protected URL catalogCwd
The base URI of the Catalog file currently being parsed.
catalogEntries
protected Vector catalogEntries
The catalog entries currently known to the system.
catalogFiles
protected Vector catalogFiles
A vector of catalog files to be loaded.
This list is initially established by
loadSystemCatalogs
when
it parses the system catalog list, but CATALOG entries may
contribute to it during the course of parsing.
catalogManager
protected CatalogManager catalogManager
The catalog manager in use for this instance.
catalogs
protected Vector catalogs
A vector of Catalogs.
The semantics of Catalog resolution are such that each
catalog is effectively a list of Catalogs (in other words,
a recursive list of Catalog instances).
Catalogs that are processed as the result of CATALOG or
DELEGATE* entries are subordinate to the catalog that contained
them, but they may in turn have subordinate catalogs.
Catalogs are only loaded when they are needed, so this vector
initially contains a list of Catalog filenames (URLs). If, during
processing, one of these catalogs has to be loaded, the resulting
Catalog object is placed in the vector, effectively caching it
for the next query.
default_override
protected boolean default_override
The default initial override setting.
localCatalogFiles
protected Vector localCatalogFiles
A vector of catalog files constructed during processing of
CATALOG entries in the current catalog.
This two-level system is actually necessary to correctly implement
the semantics of the CATALOG entry. If one catalog file includes
another with a CATALOG entry, the included catalog logically
occurs
at the end of the including catalog, and after any
preceding CATALOG entries. In other words, the CATALOG entry
cannot insert anything into the middle of a catalog file.
When processing reaches the end of each catalog files, any
elements on this vector are added to the front of the
catalogFiles
vector.
localDelegate
protected Vector localDelegate
A vector of DELEGATE* Catalog entries constructed during
processing of the Catalog.
This two-level system has two purposes; first, it allows
us to sort the DELEGATE* entries by the length of the partial
public identifier so that a linear search encounters them in
the correct order and second, it puts them all at the end of
the Catalog.
When processing reaches the end of each catalog file, any
elements on this vector are added to the end of the
catalogEntries
vector. This assures that matching
PUBLIC keywords are encountered before DELEGATE* entries.
readerArr
protected Vector readerArr
A vector of CatalogReaders.
This vector contains all of the readers in the order that they
were added. In the event that a catalog is read from a file, where
the MIME type is unknown, each reader is attempted in turn until
one succeeds.
readerMap
protected Hashtable readerMap
A hash of CatalogReaders.
This hash maps MIME types to elements in the readerArr
vector. This allows the Catalog to quickly locate the reader
for a particular MIME type.
addDelegate
protected void addDelegate(CatalogEntry entry)
Add to the current list of delegated catalogs.
This method always constructs the
localDelegate
vector so that it is ordered by length of partial
public identifier.
entry
- The DELEGATE catalog entry
addEntry
public void addEntry(CatalogEntry entry)
Cleanup and process a Catalog entry.
This method processes each Catalog entry, changing mapped
relative system identifiers into absolute ones (based on the current
base URI), and maintaining other information about the current
catalog.
entry
- The CatalogEntry to process.
addReader
public void addReader(String mimeType,
CatalogReader reader)
Add a new CatalogReader to the Catalog.
This method allows you to add a new CatalogReader to the
catalog. The reader will be associated with the specified mimeType.
You can only have one reader per mimeType.
In the absence of a mimeType (e.g., when reading a catalog
directly from a file on the local system), the readers are attempted
in the order that you add them to the Catalog.
Note that subordinate catalogs (created by CATALOG or
DELEGATE* entries) get a copy of the set of readers present in
the primary catalog when they are created. Readers added subsequently
will not be available. For this reason, it is best to add all
of the readers before the first call to parse a catalog.
mimeType
- The MIME type associated with this reader.reader
- The CatalogReader to use.
copyReaders
protected void copyReaders(Catalog newCatalog)
Copies the reader list from the current Catalog to a new Catalog.
This method is used internally when constructing a new catalog.
It copies the current reader associations over to the new catalog.
newCatalog
- The new Catalog.
encodedByte
protected String encodedByte(int b)
Perform %-encoding on a single byte.
b
- The 8-bit integer that represents th byte. (Bytes are signed
but encoding needs to look at the bytes unsigned.)
- The %-encoded string for the byte in question.
fixSlashes
protected String fixSlashes(String sysid)
Replace backslashes with forward slashes. (URLs always use
forward slashes.)
sysid
- The input system identifier.
- The same system identifier with backslashes turned into
forward slashes.
getCatalogManager
public CatalogManager getCatalogManager()
Return the CatalogManager used by this catalog.
getCurrentBase
public String getCurrentBase()
Returns the current base URI.
getDefaultOverride
public String getDefaultOverride()
Returns the default override setting associated with this
catalog.
All catalog files loaded by this catalog will have the
initial override setting specified by this default.
loadSystemCatalogs
public void loadSystemCatalogs()
throws MalformedURLException,
IOException
Load the system catalog files.
The method adds all of the
catalogs specified in the
xml.catalog.files property
to the Catalog list.
makeAbsolute
protected String makeAbsolute(String sysid)
Construct an absolute URI from a relative one, using the current
base URI.
sysid
- The (possibly relative) system identifier
- The system identifier made absolute with respect to the
current
base
.
newCatalog
protected Catalog newCatalog()
Create a new Catalog object.
This method constructs a new instance of the running Catalog
class (which might be a subtype of org.apache.xml.resolver.Catalog).
All new catalogs are managed by the same CatalogManager.
N.B. All Catalog subtypes should call newCatalog() to construct
a new Catalog. Do not simply use "new Subclass()" since that will
confuse future subclasses.
normalizeURI
protected String normalizeURI(String uriref)
Perform character normalization on a URI reference.
uriref
- The URI reference
- The normalized URI reference.
parseAllCatalogs
public void parseAllCatalogs()
throws MalformedURLException,
IOException
Parse all subordinate catalogs.
This method recursively parses all of the subordinate catalogs.
If this method does not throw an exception, you can be confident that
no subsequent call to any resolve*() method will either, with two
possible exceptions:
- Delegated catalogs are re-parsed each time they are needed
(because a variable list of them may be needed in each case,
depending on the length of the matching partial public identifier).
But they are parsed by this method, so as long as they don't
change or disappear while the program is running, they shouldn't
generate errors later if they don't generate errors now.
- If you add new catalogs with
parseCatalog
, they
won't be loaded until they are needed or until you call
parseAllCatalogs
again.
On the other hand, if you don't call this method, you may
successfully parse documents without having to load all possible
catalogs.
parseCatalog
public void parseCatalog(String fileName)
throws MalformedURLException,
IOException
Parse a catalog file, augmenting internal data structures.
fileName
- The filename of the catalog file to process
parseCatalog
public void parseCatalog(String mimeType,
InputStream is)
throws IOException,
CatalogException
Parse a catalog file, augmenting internal data structures.
Catalogs retrieved over the net may have an associated MIME type.
The MIME type can be used to select an appropriate reader.
mimeType
- The MIME type of the catalog file.is
- The InputStream from which the catalog should be read
parseCatalog
public void parseCatalog(URL aUrl)
throws IOException
Parse a catalog document, augmenting internal data structures.
This method supports catalog files stored in jar files: e.g.,
jar:file:///path/to/filename.jar!/path/to/catalog.xml". That URI
doesn't survive transmogrification through the URI processing that
the parseCatalog(String) performs and passing it as an input stream
doesn't set the base URI appropriately.
Written by Stefan Wachter (2002-09-26)
aUrl
- The URL of the catalog document to process
parseCatalogFile
protected void parseCatalogFile(String fileName)
throws MalformedURLException,
IOException,
CatalogException
Parse a single catalog file, augmenting internal data structures.
fileName
- The filename of the catalog file to process
parsePendingCatalogs
protected void parsePendingCatalogs()
throws MalformedURLException,
IOException
Parse all of the pending catalogs.
Catalogs may refer to other catalogs, this method parses
all of the currently pending catalog files.
resolveDoctype
public String resolveDoctype(String entityName,
String publicId,
String systemId)
throws MalformedURLException,
IOException
Return the applicable DOCTYPE system identifier.
entityName
- The name of the entity (element) for which
a doctype is required.publicId
- The nominal public identifier for the doctype
(as provided in the source document).systemId
- The nominal system identifier for the doctype
(as provided in the source document).
- The system identifier to use for the doctype.
resolveDocument
public String resolveDocument()
throws MalformedURLException,
IOException
Return the applicable DOCUMENT entry.
- The system identifier to use for the doctype.
resolveEntity
public String resolveEntity(String entityName,
String publicId,
String systemId)
throws MalformedURLException,
IOException
Return the applicable ENTITY system identifier.
entityName
- The name of the entity for which
a system identifier is required.publicId
- The nominal public identifier for the entity
(as provided in the source document).systemId
- The nominal system identifier for the entity
(as provided in the source document).
- The system identifier to use for the entity.
resolveLocalPublic
protected String resolveLocalPublic(int entityType,
String entityName,
String publicId,
String systemId)
throws MalformedURLException,
IOException
Return the applicable PUBLIC or SYSTEM identifier.
This method searches the Catalog and returns the system
identifier specified for the given system or public identifiers.
If no appropriate PUBLIC or SYSTEM entry is found in the Catalog,
delegated Catalogs are interrogated.
There are four possible cases:
- If the system identifier provided matches a SYSTEM entry
in the current catalog, the SYSTEM entry is returned.
- If the system identifier is not null, the PUBLIC entries
that were encountered when OVERRIDE YES was in effect are
interrogated and the first matching entry is returned.
- If the system identifier is null, then all of the PUBLIC
entries are interrogated and the first matching entry
is returned. This may not be the same as the preceding case, if
some PUBLIC entries are encountered when OVERRIDE NO is in effect. In
XML, the only place where a public identifier may occur without
a system identifier is in a notation declaration.
- Finally, if the public identifier matches one of the partial
public identifiers specified in a DELEGATE* entry in
the Catalog, the delegated catalog is interrogated. The first
time that the delegated catalog is required, it will be
retrieved and parsed. It is subsequently cached.
entityType
- The CatalogEntry type for which this query is
being conducted. This is necessary in order to do the approprate
query on a delegated catalog.entityName
- The name of the entity being searched for, if
appropriate.publicId
- The public identifier of the entity in question.systemId
- The nominal system identifier for the entity
in question (as provided in the source document).
- The system identifier to use.
Note that the nominal system identifier is not returned if a
match is not found in the catalog, instead null is returned
to indicate that no match was found.
resolveLocalSystem
protected String resolveLocalSystem(String systemId)
throws MalformedURLException,
IOException
Return the applicable SYSTEM system identifier in this
catalog.
If a SYSTEM entry exists in the catalog file
for the system ID specified, return the mapped value.
systemId
- The system ID to locate in the catalog
- The mapped system identifier or null
resolveLocalURI
protected String resolveLocalURI(String uri)
throws MalformedURLException,
IOException
Return the applicable URI in this catalog.
If a URI entry exists in the catalog file
for the URI specified, return the mapped value.
uri
- The URI to locate in the catalog
resolveNotation
public String resolveNotation(String notationName,
String publicId,
String systemId)
throws MalformedURLException,
IOException
Return the applicable NOTATION system identifier.
notationName
- The name of the notation for which
a doctype is required.publicId
- The nominal public identifier for the notation
(as provided in the source document).systemId
- The nominal system identifier for the notation
(as provided in the source document).
- The system identifier to use for the notation.
resolvePublic
public String resolvePublic(String publicId,
String systemId)
throws MalformedURLException,
IOException
Return the applicable PUBLIC or SYSTEM identifier.
This method searches the Catalog and returns the system
identifier specified for the given system or
public identifiers. If
no appropriate PUBLIC or SYSTEM entry is found in the Catalog,
null is returned.
publicId
- The public identifier to locate in the catalog.
Public identifiers are normalized before comparison.systemId
- The nominal system identifier for the entity
in question (as provided in the source document).
- The system identifier to use.
Note that the nominal system identifier is not returned if a
match is not found in the catalog, instead null is returned
to indicate that no match was found.
resolveSubordinateCatalogs
protected String resolveSubordinateCatalogs(int entityType,
String entityName,
String publicId,
String systemId)
throws MalformedURLException,
IOException
Search the subordinate catalogs, in order, looking for a match.
This method searches the Catalog and returns the system
identifier specified for the given entity type with the given
name, public, and system identifiers. In some contexts, these
may be null.
entityType
- The CatalogEntry type for which this query is
being conducted. This is necessary in order to do the approprate
query on a subordinate catalog.entityName
- The name of the entity being searched for, if
appropriate.publicId
- The public identifier of the entity in question
(as provided in the source document).systemId
- The nominal system identifier for the entity
in question (as provided in the source document). This parameter is
overloaded for the URI entry type.
- The system identifier to use.
Note that the nominal system identifier is not returned if a
match is not found in the catalog, instead null is returned
to indicate that no match was found.
resolveSystem
public String resolveSystem(String systemId)
throws MalformedURLException,
IOException
Return the applicable SYSTEM system identifier.
If a SYSTEM entry exists in the Catalog
for the system ID specified, return the mapped value.
On Windows-based operating systems, the comparison between
the system identifier provided and the SYSTEM entries in the
Catalog is case-insensitive.
systemId
- The system ID to locate in the catalog.
- The resolved system identifier.
resolveURI
public String resolveURI(String uri)
throws MalformedURLException,
IOException
Return the applicable URI.
If a URI entry exists in the Catalog
for the URI specified, return the mapped value.
URI comparison is case sensitive.
uri
- The URI to locate in the catalog.
setCatalogManager
public void setCatalogManager(CatalogManager manager)
Establish the CatalogManager used by this catalog.
setupReaders
public void setupReaders()
Setup readers.
unknownEntry
public void unknownEntry(Vector strings)
Handle unknown CatalogEntry types.
This method exists to allow subclasses to deal with unknown
entry types.