org.apache.xml.resolver
public class Catalog extends Object
Represents OASIS Open Catalog files.
This class implements the semantics of OASIS Open Catalog files (defined by OASIS Technical Resolution 9401:1997 (Amendment 2 to TR 9401)).
The primary purpose of the Catalog is to associate resources in the document with local system identifiers. Some entities (document types, XML entities, and notations) have names and all of them can have either public or system identifiers or both. (In XML, only a notation can have a public identifier without a system identifier, but the methods implemented in this class obey the Catalog semantics from the SGML days when system identifiers were optional.)
The system identifiers returned by the resolution methods in this class are valid, i.e. usable by, and in fact constructed by, the java.net.URL class. Unfortunately, this class seems to behave in somewhat non-standard ways and the system identifiers returned may not be directly usable in a browser or filesystem context.
This class recognizes all of the Catalog entries defined in TR9401:1997:
Note that BASE entries are treated as described by RFC2396. In particular, this has the counter-intuitive property that after a BASE entry identifing "http://example.com/a/b/c" as the base URI, the relative URI "foo" is resolved to the absolute URI "http://example.com/a/b/foo". You must provide the trailing slash if you do not want the final component of the path to be discarded as a filename would in a URI for a resource: "http://example.com/a/b/c/".
Note that subordinate catalogs (all catalogs except the first, including CATALOG and DELEGATE* catalogs) are only loaded if and when they are required.
This class relies on classes which implement the CatalogReader interface to actually load catalog files. This allows the catalog semantics to be implemented for TR9401 text-based catalogs, XML catalogs, or any number of other storage formats.
Additional catalogs may also be loaded with the Catalog method.
Change Log:
Rewrite to use CatalogReaders.
Allow quoted components in xml.catalog.files so that URLs containing colons can be used on Unix. The string passed to xml.catalog.files can now have the form:
unquoted-path-with-no-sep-chars:"double-quoted path with or without sep chars":'single-quoted path with or without sep chars'
(Where ":" is the separater character in this example.)
If an unquoted path contains an embedded double or single quote character, no special processig is performed on that character. No path can contain separater characters, double, and single quotes simultaneously.
Fix bug in calculation of BASE entries: if a catalog contains multiple BASE entries, each is relative to the preceding base, not the default base URI of the catalog.
Fixed a bug in the calculation of the list of subordinate catalogs. This bug caused an infinite loop where parsing would alternately process two catalogs indefinitely.
Version: 1.0
Derived from public domain code originally published by Arbortext,
Inc.
See Also: CatalogReader
Field Summary | |
---|---|
protected URL | base
The base URI for relative system identifiers in the catalog.
|
static int | BASE The BASE Catalog Entry type |
protected URL | catalogCwd The base URI of the Catalog file currently being parsed. |
protected Vector | catalogEntries The catalog entries currently known to the system. |
protected Vector | catalogFiles A vector of catalog files to be loaded. This list is initially established by
|
protected Vector | catalogs A vector of Catalogs. The semantics of Catalog resolution are such that each catalog is effectively a list of Catalogs (in other words, a recursive list of Catalog instances). Catalogs that are processed as the result of CATALOG or DELEGATE* entries are subordinate to the catalog that contained them, but they may in turn have subordinate catalogs. Catalogs are only loaded when they are needed, so this vector initially contains a list of Catalog filenames (URLs). |
static int | CATALOG The CATALOG Catalog Entry type |
protected boolean | default_override The default initial override setting. |
static int | DELEGATE_PUBLIC The DELEGATE_PUBLIC Catalog Entry type |
static int | DELEGATE_SYSTEM The DELEGATE_SYSTEM Catalog Entry type |
static int | DELEGATE_URI The DELEGATE_URI Catalog Entry type |
static int | DOCTYPE The DOCTYPE Catalog Entry type |
static int | DOCUMENT The DOCUMENT Catalog Entry type |
static int | DTDDECL The DTDDECL Catalog Entry type |
static int | ENTITY The ENTITY Catalog Entry type |
protected Vector | localCatalogFiles A vector of catalog files constructed during processing of CATALOG entries in the current catalog. This two-level system is actually necessary to correctly implement the semantics of the CATALOG entry. |
protected Vector | localDelegate A vector of DELEGATE* Catalog entries constructed during processing of the Catalog. This two-level system has two purposes; first, it allows us to sort the DELEGATE* entries by the length of the partial public identifier so that a linear search encounters them in the correct order and second, it puts them all at the end of the Catalog. When processing reaches the end of each catalog file, any
elements on this vector are added to the end of the
|
static int | LINKTYPE The LINKTYPE Catalog Entry type |
static int | NOTATION The NOTATION Catalog Entry type |
static int | OVERRIDE The OVERRIDE Catalog Entry type |
static int | PUBLIC The PUBLIC Catalog Entry type |
protected Vector | readerArr A vector of CatalogReaders. This vector contains all of the readers in the order that they were added. |
protected Hashtable | readerMap A hash of CatalogReaders. This hash maps MIME types to elements in the readerArr vector. |
static int | REWRITE_SYSTEM The REWRITE_SYSTEM Catalog Entry type |
static int | REWRITE_URI The REWRITE_URI Catalog Entry type |
static int | SGMLDECL The SGMLDECL Catalog Entry type |
static int | SYSTEM The SYSTEM Catalog Entry type |
static int | URI The URI Catalog Entry type |
Constructor Summary | |
---|---|
Catalog() Constructs an empty Catalog. The constructor interrogates the relevant system properties and initializes the catalog data structures. |
Method Summary | |
---|---|
protected void | addDelegate(CatalogEntry entry) Add to the current list of delegated catalogs. This method always constructs the localDelegate vector so that it is ordered by length of partial public identifier. |
void | addEntry(CatalogEntry entry) Cleanup and process a Catalog entry. This method processes each Catalog entry, changing mapped relative system identifiers into absolute ones (based on the current base URI), and maintaining other information about the current catalog. |
void | addReader(String mimeType, CatalogReader reader) Add a new CatalogReader to the Catalog. This method allows you to add a new CatalogReader to the catalog. |
protected void | copyReaders(Catalog newCatalog) Copies the reader list from the current Catalog to a new Catalog This method is used internally when constructing a new catalog. |
protected String | encodedByte(int b) Perform %-encoding on a single byte. |
protected String | fixSlashes(String sysid) Replace backslashes with forward slashes. |
String | getCurrentBase() Returns the current base URI. |
String | getDefaultOverride() Returns the default override setting associated with this catalog. All catalog files loaded by this catalog will have the initial override setting specified by this default. |
void | loadSystemCatalogs() Load the system catalog files. The method adds all of the catalogs specified in the xml.catalog.files property to the Catalog list. |
protected String | makeAbsolute(String sysid) Construct an absolute URI from a relative one, using the current base URI. |
protected Catalog | newCatalog() Create a new Catalog object. This method constructs a new instance of the running Catalog class (which might be a subtype of org.apache.xml.resolver.Catalog). |
protected String | normalizeURI(String uriref) Perform character normalization on a URI reference. |
void | parseAllCatalogs() Parse all subordinate catalogs. This method recursively parses all of the subordinate catalogs. |
void | parseCatalog(String fileName) Parse a catalog file, augmenting internal data structures |
void | parseCatalog(String mimeType, InputStream is) Parse a catalog file, augmenting internal data structures Catalogs retrieved over the net may have an associated MIME type. |
protected void | parseCatalogFile(String fileName) Parse a single catalog file, augmenting internal data structures |
protected void | parsePendingCatalogs() Parse all of the pending catalogs. Catalogs may refer to other catalogs, this method parses all of the currently pending catalog files. |
String | resolveDoctype(String entityName, String publicId, String systemId) Return the applicable DOCTYPE system identifier. |
String | resolveDocument() Return the applicable DOCUMENT entry. |
String | resolveEntity(String entityName, String publicId, String systemId) Return the applicable ENTITY system identifier. |
protected String | resolveLocalPublic(int entityType, String entityName, String publicId, String systemId) Return the applicable PUBLIC or SYSTEM identifier This method searches the Catalog and returns the system identifier specified for the given system or public identifiers. |
protected String | resolveLocalSystem(String systemId) Return the applicable SYSTEM system identifier in this catalog. If a SYSTEM entry exists in the catalog file for the system ID specified, return the mapped value. |
protected String | resolveLocalURI(String uri) Return the applicable URI in this catalog. If a URI entry exists in the catalog file for the URI specified, return the mapped value. |
String | resolveNotation(String notationName, String publicId, String systemId) Return the applicable NOTATION system identifier. |
String | resolvePublic(String publicId, String systemId) Return the applicable PUBLIC or SYSTEM identifier. This method searches the Catalog and returns the system identifier specified for the given system or public identifiers. |
protected String | resolveSubordinateCatalogs(int entityType, String entityName, String publicId, String systemId) Search the subordinate catalogs, in order, looking for a match. This method searches the Catalog and returns the system identifier specified for the given entity type with the given name, public, and system identifiers. |
String | resolveSystem(String systemId) Return the applicable SYSTEM system identifier If a SYSTEM entry exists in the Catalog for the system ID specified, return the mapped value. On Windows-based operating systems, the comparison between the system identifier provided and the SYSTEM entries in the Catalog is case-insensitive. |
String | resolveURI(String uri) Return the applicable URI If a URI entry exists in the Catalog for the URI specified, return the mapped value. URI comparison is case sensitive. |
void | setupReaders() Setup readers. |
void | unknownEntry(Vector strings) Handle unknown CatalogEntry types. This method exists to allow subclasses to deal with unknown entry types. |
A vector of catalog files to be loaded.
This list is initially established by
loadSystemCatalogs
when
it parses the system catalog list, but CATALOG entries may
contribute to it during the course of parsing.
See Also: Catalog localCatalogFiles
A vector of Catalogs.
The semantics of Catalog resolution are such that each catalog is effectively a list of Catalogs (in other words, a recursive list of Catalog instances).
Catalogs that are processed as the result of CATALOG or DELEGATE* entries are subordinate to the catalog that contained them, but they may in turn have subordinate catalogs.
Catalogs are only loaded when they are needed, so this vector initially contains a list of Catalog filenames (URLs). If, during processing, one of these catalogs has to be loaded, the resulting Catalog object is placed in the vector, effectively caching it for the next query.
A vector of catalog files constructed during processing of CATALOG entries in the current catalog.
This two-level system is actually necessary to correctly implement the semantics of the CATALOG entry. If one catalog file includes another with a CATALOG entry, the included catalog logically occurs at the end of the including catalog, and after any preceding CATALOG entries. In other words, the CATALOG entry cannot insert anything into the middle of a catalog file.
When processing reaches the end of each catalog files, any
elements on this vector are added to the front of the
catalogFiles
vector.
See Also: catalogFiles
A vector of DELEGATE* Catalog entries constructed during processing of the Catalog.
This two-level system has two purposes; first, it allows us to sort the DELEGATE* entries by the length of the partial public identifier so that a linear search encounters them in the correct order and second, it puts them all at the end of the Catalog.
When processing reaches the end of each catalog file, any
elements on this vector are added to the end of the
catalogEntries
vector. This assures that matching
PUBLIC keywords are encountered before DELEGATE* entries.
A vector of CatalogReaders.
This vector contains all of the readers in the order that they were added. In the event that a catalog is read from a file, where the MIME type is unknown, each reader is attempted in turn until one succeeds.
A hash of CatalogReaders.
This hash maps MIME types to elements in the readerArr vector. This allows the Catalog to quickly locate the reader for a particular MIME type.
Constructs an empty Catalog.
The constructor interrogates the relevant system properties and initializes the catalog data structures.
Add to the current list of delegated catalogs.
This method always constructs the localDelegate vector so that it is ordered by length of partial public identifier.
Parameters: entry The DELEGATE catalog entry
Cleanup and process a Catalog entry.
This method processes each Catalog entry, changing mapped relative system identifiers into absolute ones (based on the current base URI), and maintaining other information about the current catalog.
Parameters: entry The CatalogEntry to process.
Add a new CatalogReader to the Catalog.
This method allows you to add a new CatalogReader to the catalog. The reader will be associated with the specified mimeType. You can only have one reader per mimeType.
In the absence of a mimeType (e.g., when reading a catalog directly from a file on the local system), the readers are attempted in the order that you add them to the Catalog.
Note that subordinate catalogs (created by CATALOG or DELEGATE* entries) get a copy of the set of readers present in the primary catalog when they are created. Readers added subsequently will not be available. For this reason, it is best to add all of the readers before the first call to parse a catalog.
Parameters: mimeType The MIME type associated with this reader. reader The CatalogReader to use.
Copies the reader list from the current Catalog to a new Catalog
This method is used internally when constructing a new catalog. It copies the current reader associations over to the new catalog.
Parameters: newCatalog The new Catalog.
Perform %-encoding on a single byte.
Parameters: b The 8-bit integer that represents th byte. (Bytes are signed but encoding needs to look at the bytes unsigned.)
Returns: The %-encoded string for the byte in question.
Replace backslashes with forward slashes. (URLs always use forward slashes.)
Parameters: sysid The input system identifier.
Returns: The same system identifier with backslashes turned into forward slashes.
Returns the current base URI.
Returns the default override setting associated with this catalog.
All catalog files loaded by this catalog will have the initial override setting specified by this default.
Load the system catalog files.
The method adds all of the catalogs specified in the xml.catalog.files property to the Catalog list.
Throws: MalformedURLException One of the system catalogs is identified with a filename that is not a valid URL. IOException One of the system catalogs cannot be read.
Construct an absolute URI from a relative one, using the current base URI.
Parameters: sysid The (possibly relative) system identifier
Returns: The system identifier made absolute with respect to the current base.
Create a new Catalog object.
This method constructs a new instance of the running Catalog class (which might be a subtype of org.apache.xml.resolver.Catalog).
N.B. All Catalog subtypes should call newCatalog() to construct a new Catalog. Do not simply use "new Subclass()" since that will confuse future subclasses.
Perform character normalization on a URI reference.
Parameters: uriref The URI reference
Returns: The normalized URI reference.
Parse all subordinate catalogs.
This method recursively parses all of the subordinate catalogs. If this method does not throw an exception, you can be confident that no subsequent call to any resolve*() method will either, with two possible exceptions:
Delegated catalogs are re-parsed each time they are needed (because a variable list of them may be needed in each case, depending on the length of the matching partial public identifier).
But they are parsed by this method, so as long as they don't change or disappear while the program is running, they shouldn't generate errors later if they don't generate errors now.
If you add new catalogs with parseCatalog
, they
won't be loaded until they are needed or until you call
parseAllCatalogs
again.
On the other hand, if you don't call this method, you may successfully parse documents without having to load all possible catalogs.
Throws: MalformedURLException The filename (URL) for a subordinate or delegated catalog is not a valid URL. IOException Error reading some subordinate or delegated catalog file.
Parse a catalog file, augmenting internal data structures
Parameters: fileName The filename of the catalog file to process
Throws: MalformedURLException The fileName cannot be turned into a valid URL. IOException Error reading catalog file.
Parse a catalog file, augmenting internal data structures
Catalogs retrieved over the net may have an associated MIME type. The MIME type can be used to select an appropriate reader.
Parameters: mimeType The MIME type of the catalog file. is The InputStream from which the catalog should be read
Throws: CatalogException Failed to load catalog mimeType. IOException Error reading catalog file.
Parse a single catalog file, augmenting internal data structures
Parameters: fileName The filename of the catalog file to process
Throws: MalformedURLException The fileName cannot be turned into a valid URL. IOException Error reading catalog file.
Parse all of the pending catalogs.
Catalogs may refer to other catalogs, this method parses all of the currently pending catalog files.
Return the applicable DOCTYPE system identifier.
Parameters: entityName The name of the entity (element) for which a doctype is required. publicId The nominal public identifier for the doctype (as provided in the source document). systemId The nominal system identifier for the doctype (as provided in the source document).
Returns: The system identifier to use for the doctype.
Throws: MalformedURLException The formal system identifier of a subordinate catalog cannot be turned into a valid URL. IOException Error reading subordinate catalog file.
Return the applicable DOCUMENT entry.
Returns: The system identifier to use for the doctype.
Throws: MalformedURLException The formal system identifier of a subordinate catalog cannot be turned into a valid URL. IOException Error reading subordinate catalog file.
Return the applicable ENTITY system identifier.
Parameters: entityName The name of the entity for which a system identifier is required. publicId The nominal public identifier for the entity (as provided in the source document). systemId The nominal system identifier for the entity (as provided in the source document).
Returns: The system identifier to use for the entity.
Throws: MalformedURLException The formal system identifier of a subordinate catalog cannot be turned into a valid URL. IOException Error reading subordinate catalog file.
Return the applicable PUBLIC or SYSTEM identifier
This method searches the Catalog and returns the system identifier specified for the given system or public identifiers. If no appropriate PUBLIC or SYSTEM entry is found in the Catalog, delegated Catalogs are interrogated.
There are four possible cases:
Parameters: entityType The CatalogEntry type for which this query is being conducted. This is necessary in order to do the approprate query on a delegated catalog. entityName The name of the entity being searched for, if appropriate. publicId The public identifier of the entity in question. systemId The nominal system identifier for the entity in question (as provided in the source document).
Returns: The system identifier to use. Note that the nominal system identifier is not returned if a match is not found in the catalog, instead null is returned to indicate that no match was found.
Throws: MalformedURLException The formal system identifier of a delegated catalog cannot be turned into a valid URL. IOException Error reading delegated catalog file.
Return the applicable SYSTEM system identifier in this catalog.
If a SYSTEM entry exists in the catalog file for the system ID specified, return the mapped value.
Parameters: systemId The system ID to locate in the catalog
Returns: The mapped system identifier or null
Return the applicable URI in this catalog.
If a URI entry exists in the catalog file for the URI specified, return the mapped value.
Parameters: uri The URI to locate in the catalog
Returns: The mapped URI or null
Return the applicable NOTATION system identifier.
Parameters: notationName The name of the notation for which a doctype is required. publicId The nominal public identifier for the notation (as provided in the source document). systemId The nominal system identifier for the notation (as provided in the source document).
Returns: The system identifier to use for the notation.
Throws: MalformedURLException The formal system identifier of a subordinate catalog cannot be turned into a valid URL. IOException Error reading subordinate catalog file.
Return the applicable PUBLIC or SYSTEM identifier.
This method searches the Catalog and returns the system identifier specified for the given system or public identifiers. If no appropriate PUBLIC or SYSTEM entry is found in the Catalog, null is returned.
Parameters: publicId The public identifier to locate in the catalog. Public identifiers are normalized before comparison. systemId The nominal system identifier for the entity in question (as provided in the source document).
Returns: The system identifier to use. Note that the nominal system identifier is not returned if a match is not found in the catalog, instead null is returned to indicate that no match was found.
Throws: MalformedURLException The formal system identifier of a subordinate catalog cannot be turned into a valid URL. IOException Error reading subordinate catalog file.
Search the subordinate catalogs, in order, looking for a match.
This method searches the Catalog and returns the system identifier specified for the given entity type with the given name, public, and system identifiers. In some contexts, these may be null.
Parameters: entityType The CatalogEntry type for which this query is being conducted. This is necessary in order to do the approprate query on a subordinate catalog. entityName The name of the entity being searched for, if appropriate. publicId The public identifier of the entity in question (as provided in the source document). systemId The nominal system identifier for the entity in question (as provided in the source document). This parameter is overloaded for the URI entry type.
Returns: The system identifier to use. Note that the nominal system identifier is not returned if a match is not found in the catalog, instead null is returned to indicate that no match was found.
Throws: MalformedURLException The formal system identifier of a delegated catalog cannot be turned into a valid URL. IOException Error reading delegated catalog file.
Return the applicable SYSTEM system identifier
If a SYSTEM entry exists in the Catalog for the system ID specified, return the mapped value.
On Windows-based operating systems, the comparison between the system identifier provided and the SYSTEM entries in the Catalog is case-insensitive.
Parameters: systemId The system ID to locate in the catalog.
Returns: The resolved system identifier.
Throws: MalformedURLException The formal system identifier of a subordinate catalog cannot be turned into a valid URL. IOException Error reading subordinate catalog file.
Return the applicable URI
If a URI entry exists in the Catalog for the URI specified, return the mapped value.
URI comparison is case sensitive.
Parameters: uri The URI to locate in the catalog.
Returns: The resolved URI.
Throws: MalformedURLException The system identifier of a subordinate catalog cannot be turned into a valid URL. IOException Error reading subordinate catalog file.
Setup readers.
Handle unknown CatalogEntry types.
This method exists to allow subclasses to deal with unknown entry types.