org.apache.commons.httpclient
public class URI extends Object implements Cloneable, Comparable, Serializable
A URI is always in an "escaped" form, since escaping or unescaping a completed URI might change its semantics.
Implementers should be careful not to escape or unescape the same string more than once, since unescaping an already unescaped string might lead to misinterpreting a percent data character as another escaped character, or vice versa in the case of escaping an already escaped string.
In order to avoid these problems, data types used as follows:
URI character sequence: char octet sequence: byte original character sequence: String
So, a URI is a sequence of characters as an array of a char type, which is not always represented as a sequence of octets as an array of byte.
URI Syntactic Components
- In general, written as follows: Absolute URI = <scheme>:<scheme-specific-part> Generic URI = <scheme>://<authority><path>?<query> - Syntax absoluteURI = scheme ":" ( hier_part | opaque_part ) hier_part = ( net_path | abs_path ) [ "?" query ] net_path = "//" authority [ abs_path ] abs_path = "/" path_segments
The following examples illustrate URI that are in common use.
ftp://ftp.is.co.za/rfc/rfc1808.txt -- ftp scheme for File Transfer Protocol services gopher://spinaltap.micro.umn.edu/00/Weather/California/Los%20Angeles -- gopher scheme for Gopher and Gopher+ Protocol services http://www.math.uio.no/faq/compression-faq/part1.html -- http scheme for Hypertext Transfer Protocol services mailto:mduerst@ifi.unizh.ch -- mailto scheme for electronic mail addresses news:comp.infosystems.www.servers.unix -- news scheme for USENET news groups and articles telnet://melvyl.ucop.edu/ -- telnet scheme for interactive services via the TELNET ProtocolPlease, notice that there are many modifications from URL(RFC 1738) and relative URL(RFC 1808).
The expressions for a URI
For escaped URI forms - URI(char[]) // constructor - char[] getRawXxx() // method - String getEscapedXxx() // method - String toString() // methodFor unescaped URI forms - URI(String) // constructor - String getXXX() // method
Version: $Revision: 372560 $ $Date: 2002/03/14 15:14:01
Nested Class Summary | |
---|---|
static class | URI.DefaultCharsetChanged
The charset-changed normal operation to represent to be required to
alert to user the fact the default charset is changed. |
static class | URI.LocaleToCharsetMap
A mapping to determine the (somewhat arbitrarily) preferred charset for a
given locale. |
Field Summary | |
---|---|
protected static BitSet | absoluteURI
BitSet for absoluteURI.
|
protected static BitSet | abs_path
URI absolute path.
|
static BitSet | allowed_abs_path
Those characters that are allowed for the abs_path. |
static BitSet | allowed_authority
Those characters that are allowed for the authority component. |
static BitSet | allowed_fragment
Those characters that are allowed for the fragment component. |
static BitSet | allowed_host
Those characters that are allowed for the host component.
|
static BitSet | allowed_IPv6reference
Those characters that are allowed for the IPv6reference component.
|
static BitSet | allowed_opaque_part
Those characters that are allowed for the opaque_part. |
static BitSet | allowed_query
Those characters that are allowed for the query component. |
static BitSet | allowed_reg_name
Those characters that are allowed for the reg_name. |
static BitSet | allowed_rel_path
Those characters that are allowed for the rel_path. |
static BitSet | allowed_userinfo
Those characters that are allowed for the userinfo component. |
static BitSet | allowed_within_authority
Those characters that are allowed for the authority component. |
static BitSet | allowed_within_path
Those characters that are allowed within the path. |
static BitSet | allowed_within_query
Those characters that are allowed within the query component. |
static BitSet | allowed_within_userinfo
Those characters that are allowed for within the userinfo component. |
protected static BitSet | alpha
BitSet for alpha.
|
protected static BitSet | alphanum
BitSet for alphanum (join of alpha & digit).
|
protected static BitSet | authority
BitSet for authority.
|
static BitSet | control
BitSet for control. |
protected static String | defaultDocumentCharset
The default charset of the document. |
protected static String | defaultDocumentCharsetByLocale |
protected static String | defaultDocumentCharsetByPlatform |
protected static String | defaultProtocolCharset
The default charset of the protocol. |
static BitSet | delims
BitSet for delims. |
protected static BitSet | digit
BitSet for digit.
|
static BitSet | disallowed_opaque_part
Disallowed opaque_part before escaping. |
static BitSet | disallowed_rel_path
Disallowed rel_path before escaping. |
protected static BitSet | domainlabel
BitSet for domainlabel.
|
protected static BitSet | escaped
BitSet for escaped.
|
protected static BitSet | fragment
BitSet for fragment (alias for uric).
|
protected int | hash
Cache the hash code for this URI. |
protected static BitSet | hex
BitSet for hex.
|
protected static BitSet | hier_part
BitSet for hier_part.
|
protected static BitSet | host
BitSet for host.
|
protected static BitSet | hostname
BitSet for hostname.
|
protected static BitSet | hostport
BitSet for hostport.
|
protected static BitSet | IPv4address
Bitset that combines digit and dot fo IPv$address.
|
protected static BitSet | IPv6address
RFC 2373.
|
protected static BitSet | IPv6reference
RFC 2732, 2373.
|
protected static BitSet | mark
BitSet for mark.
|
protected static BitSet | net_path
BitSet for net_path.
|
protected static BitSet | opaque_part
URI bitset that combines uric_no_slash and uric.
|
protected static BitSet | param
BitSet for param (alias for pchar).
|
protected static BitSet | path
URI bitset that combines absolute path and opaque part.
|
protected static BitSet | path_segments
BitSet for path segments.
|
protected static BitSet | pchar
BitSet for pchar.
|
protected static BitSet | percent
The percent "%" character always has the reserved purpose of being the
escape indicator, it must be escaped as "%25" in order to be used as
data within a URI. |
protected static BitSet | port
Port, a logical alias for digit. |
protected String | protocolCharset
The charset of the protocol used by this URI instance. |
protected static BitSet | query
BitSet for query (alias for uric).
|
protected static BitSet | reg_name
BitSet for reg_name.
|
protected static BitSet | relativeURI
BitSet for relativeURI.
|
protected static BitSet | rel_path
BitSet for rel_path.
|
protected static BitSet | rel_segment
BitSet for rel_segment.
|
protected static BitSet | reserved
BitSet for reserved.
|
protected static char[] | rootPath
The root path. |
protected static BitSet | scheme
BitSet for scheme.
|
protected static BitSet | segment
BitSet for segment.
|
protected static BitSet | server
Bitset for server.
|
static BitSet | space
BitSet for space. |
protected static BitSet | toplabel
BitSet for toplabel.
|
protected static BitSet | unreserved
Data characters that are allowed in a URI but do not have a reserved
purpose are called unreserved.
|
static BitSet | unwise
BitSet for unwise. |
protected static BitSet | uric
BitSet for uric.
|
protected static BitSet | uric_no_slash
URI bitset for encoding typical non-slash characters.
|
protected static BitSet | userinfo
Bitset for userinfo.
|
protected static BitSet | URI_reference
BitSet for URI-reference.
|
static BitSet | within_userinfo
BitSet for within the userinfo component like user and password. |
protected char[] | _authority
The authority. |
protected char[] | _fragment
The fragment. |
protected char[] | _host
The host. |
protected boolean | _is_abs_path |
protected boolean | _is_hier_part |
protected boolean | _is_hostname |
protected boolean | _is_IPv4address |
protected boolean | _is_IPv6reference |
protected boolean | _is_net_path |
protected boolean | _is_opaque_part |
protected boolean | _is_reg_name |
protected boolean | _is_rel_path |
protected boolean | _is_server |
protected char[] | _opaque
The opaque. |
protected char[] | _path
The path. |
protected int | _port
The port. |
protected char[] | _query
The query. |
protected char[] | _scheme
The scheme. |
protected char[] | _uri
This Uniform Resource Identifier (URI).
|
protected char[] | _userinfo
The userinfo. |
Constructor Summary | |
---|---|
protected | URI() Create an instance as an internal use |
URI(String s, boolean escaped, String charset)
Construct a URI from a string with the given charset. | |
URI(String s, boolean escaped)
Construct a URI from a string with the given charset. | |
URI(char[] escaped, String charset)
Construct a URI as an escaped form of a character array with the given
charset.
| |
URI(char[] escaped)
Construct a URI as an escaped form of a character array.
| |
URI(String original, String charset)
Construct a URI from the given string with the given charset.
| |
URI(String original)
Construct a URI from the given string.
| |
URI(String scheme, String schemeSpecificPart, String fragment)
Construct a general URI from the given components.
| |
URI(String scheme, String authority, String path, String query, String fragment)
Construct a general URI from the given components.
| |
URI(String scheme, String userinfo, String host, int port)
Construct a general URI from the given components.
| |
URI(String scheme, String userinfo, String host, int port, String path)
Construct a general URI from the given components.
| |
URI(String scheme, String userinfo, String host, int port, String path, String query)
Construct a general URI from the given components.
| |
URI(String scheme, String userinfo, String host, int port, String path, String query, String fragment)
Construct a general URI from the given components.
| |
URI(String scheme, String host, String path, String fragment)
Construct a general URI from the given components.
| |
URI(URI base, String relative)
Construct a general URI with the given relative URI string.
| |
URI(URI base, String relative, boolean escaped)
Construct a general URI with the given relative URI string.
| |
URI(URI base, URI relative)
Construct a general URI with the given relative URI.
|
Method Summary | |
---|---|
Object | clone()
Create and return a copy of this object, the URI-reference containing
the userinfo component. |
int | compareTo(Object obj)
Compare this URI to another object.
|
protected static String | decode(char[] component, String charset)
Decodes URI encoded string.
|
protected static String | decode(String component, String charset)
Decodes URI encoded string.
|
protected static char[] | encode(String original, BitSet allowed, String charset)
Encodes URI string.
|
protected boolean | equals(char[] first, char[] second)
Test if the first array is equal to the second array.
|
boolean | equals(Object obj)
Test an object if this URI is equal to another.
|
String | getAboveHierPath()
Get the level above the this hierarchy level.
|
String | getAuthority()
Get the authority.
|
String | getCurrentHierPath()
Get the current hierarchy level.
|
static String | getDefaultDocumentCharset()
Get the recommended default charset of the document.
|
static String | getDefaultDocumentCharsetByLocale()
Get the default charset of the document by locale.
|
static String | getDefaultDocumentCharsetByPlatform()
Get the default charset of the document by platform.
|
static String | getDefaultProtocolCharset()
Get the default charset of the protocol.
|
String | getEscapedAboveHierPath()
Get the level above the this hierarchy level.
|
String | getEscapedAuthority()
Get the escaped authority.
|
String | getEscapedCurrentHierPath()
Get the escaped current hierarchy level.
|
String | getEscapedFragment()
Get the escaped fragment.
|
String | getEscapedName()
Get the escaped basename of the path.
|
String | getEscapedPath()
Get the escaped path.
|
String | getEscapedPathQuery()
Get the escaped query.
|
String | getEscapedQuery()
Get the escaped query.
|
String | getEscapedURI()
It can be gotten the URI character sequence. |
String | getEscapedURIReference()
Get the escaped URI reference string.
|
String | getEscapedUserinfo()
Get the escaped userinfo.
|
String | getFragment()
Get the fragment.
|
String | getHost()
Get the host.
|
String | getName()
Get the basename of the path.
|
String | getPath()
Get the path.
|
String | getPathQuery()
Get the path and query.
|
int | getPort()
Get the port. |
String | getProtocolCharset()
Get the protocol charset used by this current URI instance.
|
String | getQuery()
Get the query.
|
char[] | getRawAboveHierPath()
Get the level above the this hierarchy level.
|
char[] | getRawAuthority()
Get the raw-escaped authority.
|
protected char[] | getRawCurrentHierPath(char[] path)
Get the raw-escaped current hierarchy level in the given path.
|
char[] | getRawCurrentHierPath()
Get the raw-escaped current hierarchy level.
|
char[] | getRawFragment()
Get the raw-escaped fragment.
|
char[] | getRawHost()
Get the host.
|
char[] | getRawName()
Get the raw-escaped basename of the path.
|
char[] | getRawPath()
Get the raw-escaped path.
|
char[] | getRawPathQuery()
Get the raw-escaped path and query.
|
char[] | getRawQuery()
Get the raw-escaped query.
|
char[] | getRawScheme()
Get the scheme.
|
char[] | getRawURI()
It can be gotten the URI character sequence. |
char[] | getRawURIReference()
Get the URI reference character sequence.
|
char[] | getRawUserinfo()
Get the raw-escaped userinfo.
|
String | getScheme()
Get the scheme.
|
String | getURI()
It can be gotten the URI character sequence.
|
String | getURIReference()
Get the original URI reference string.
|
String | getUserinfo()
Get the userinfo.
|
boolean | hasAuthority()
Tell whether or not this URI has authority.
|
boolean | hasFragment()
Tell whether or not this URI has fragment.
|
int | hashCode()
Return a hash code for this URI.
|
boolean | hasQuery()
Tell whether or not this URI has query.
|
boolean | hasUserinfo()
Tell whether or not this URI has userinfo.
|
protected int | indexFirstOf(String s, String delims)
Get the earlier index that to be searched for the first occurrance in
one of any of the given string.
|
protected int | indexFirstOf(String s, String delims, int offset)
Get the earlier index that to be searched for the first occurrance in
one of any of the given string.
|
protected int | indexFirstOf(char[] s, char delim)
Get the earlier index that to be searched for the first occurrance in
one of any of the given array.
|
protected int | indexFirstOf(char[] s, char delim, int offset)
Get the earlier index that to be searched for the first occurrance in
one of any of the given array.
|
boolean | isAbsoluteURI()
Tell whether or not this URI is absolute.
|
boolean | isAbsPath()
Tell whether or not the relativeURI or hier_part of this URI is abs_path.
|
boolean | isHierPart()
Tell whether or not the absoluteURI of this URI is hier_part.
|
boolean | isHostname()
Tell whether or not the host part of this URI is hostname.
|
boolean | isIPv4address()
Tell whether or not the host part of this URI is IPv4address.
|
boolean | isIPv6reference()
Tell whether or not the host part of this URI is IPv6reference.
|
boolean | isNetPath()
Tell whether or not the relativeURI or heir_part of this URI is net_path.
|
boolean | isOpaquePart()
Tell whether or not the absoluteURI of this URI is opaque_part.
|
boolean | isRegName()
Tell whether or not the authority component of this URI is reg_name.
|
boolean | isRelativeURI()
Tell whether or not this URI is relative.
|
boolean | isRelPath()
Tell whether or not the relativeURI of this URI is rel_path.
|
boolean | isServer()
Tell whether or not the authority component of this URI is server.
|
protected char[] | normalize(char[] path)
Normalize the given hier path part.
|
void | normalize()
Normalizes the path part of this URI. |
protected void | parseAuthority(String original, boolean escaped)
Parse the authority component.
|
protected void | parseUriReference(String original, boolean escaped)
In order to avoid any possilbity of conflict with non-ASCII characters,
Parse a URI reference as a String with the character
encoding of the local system or the document.
|
protected boolean | prevalidate(String component, BitSet disallowed)
Pre-validate the unescaped URI string within a specific component.
|
protected void | readObject(ObjectInputStream ois)
Read a URI.
|
protected char[] | removeFragmentIdentifier(char[] component)
Remove the fragment identifier of the given component.
|
protected char[] | resolvePath(char[] basePath, char[] relPath)
Resolve the base and relative path.
|
static void | setDefaultDocumentCharset(String charset)
Set the default charset of the document.
|
static void | setDefaultProtocolCharset(String charset)
Set the default charset of the protocol.
|
void | setEscapedAuthority(String escapedAuthority)
Set the authority. |
void | setEscapedFragment(String escapedFragment)
Set the escaped fragment string.
|
void | setEscapedPath(String escapedPath)
Set the escaped path.
|
void | setEscapedQuery(String escapedQuery)
Set the escaped query string.
|
void | setFragment(String fragment)
Set the fragment.
|
void | setPath(String path)
Set the path.
|
void | setQuery(String query)
Set the query.
|
void | setRawAuthority(char[] escapedAuthority)
Set the authority. |
void | setRawFragment(char[] escapedFragment)
Set the raw-escaped fragment.
|
void | setRawPath(char[] escapedPath)
Set the raw-escaped path.
|
void | setRawQuery(char[] escapedQuery)
Set the raw-escaped query.
|
protected void | setURI()
Once it's parsed successfully, set this URI.
|
String | toString()
Get the escaped URI string.
|
protected boolean | validate(char[] component, BitSet generous)
Validate the URI characters within a specific component.
|
protected boolean | validate(char[] component, int soffset, int eoffset, BitSet generous)
Validate the URI characters within a specific component.
|
protected void | writeObject(ObjectOutputStream oos)
Write the content of this URI.
|
absoluteURI = scheme ":" ( hier_part | opaque_part )
abs_path = "/" path_segments
alpha = lowalpha | upalpha
alphanum = alpha | digit
authority = server | reg_name
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum
escaped = "%" hex hex
fragment = *uric
hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b" | "c" | "d" | "e" | "f"
hier_part = ( net_path | abs_path ) [ "?" query ]
host = hostname | IPv4address | IPv6reference
hostname = *( domainlabel "." ) toplabel [ "." ]
hostport = host [ ":" port ]
IPv4address = 1*digit "." 1*digit "." 1*digit "." 1*digit
IPv6address = hexpart [ ":" IPv4address ]
IPv6reference = "[" IPv6address "]"
mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
net_path = "//" authority [ abs_path ]
opaque_part = uric_no_slash *uric
param = *pchar
path = [ abs_path | opaque_part ]
path_segments = segment *( "/" segment )
pchar = unreserved | escaped | ":" | "@" | "&" | "=" | "+" | "$" | ","
query = *uric
reg_name = 1*( unreserved | escaped | "$" | "," | ";" | ":" | "@" | "&" | "=" | "+" )
relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ]
rel_path = rel_segment [ abs_path ]
rel_segment = 1*( unreserved | escaped | ";" | "@" | "&" | "=" | "+" | "$" | "," )
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","
scheme = alpha *( alpha | digit | "+" | "-" | "." )
segment = *pchar *( ";" param )
server = [ [ userinfo "@" ] hostport ]
toplabel = alpha | alpha *( alphanum | "-" ) alphanum
unreserved = alphanum | mark
uric = reserved | unreserved | escaped
uric_no_slash = unreserved | escaped | ";" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","
userinfo = *( unreserved | escaped | ";" | ":" | "&" | "=" | "+" | "$" | "," )
URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
Parameters: s URI character sequence escaped true if URI character sequence is in escaped form. false otherwise. charset the charset string to do escape encoding, if required
Throws: URIException If the URI cannot be created. NullPointerException if input string is null
Since: 3.0
See Also: URI
Parameters: s URI character sequence escaped true if URI character sequence is in escaped form. false otherwise.
Throws: URIException If the URI cannot be created. NullPointerException if input string is null
Since: 3.0
See Also: URI
Deprecated: Use #URI(String, boolean, String)
Construct a URI as an escaped form of a character array with the given charset.Parameters: escaped the URI character sequence charset the charset string to do escape encoding
Throws: URIException If the URI cannot be created. NullPointerException if escaped
is null
See Also: URI
Deprecated: Use #URI(String, boolean)
Construct a URI as an escaped form of a character array. An URI can be placed within double-quotes or angle brackets like "http://test.com/" and <http://test.com/>Parameters: escaped the URI character sequence
Throws: URIException If the URI cannot be created. NullPointerException if escaped
is null
See Also: URI
Deprecated: Use #URI(String, boolean, String)
Construct a URI from the given string with the given charset.Parameters: original the string to be represented to URI character sequence It is one of absoluteURI and relativeURI. charset the charset string to do escape encoding
Throws: URIException If the URI cannot be created.
See Also: URI
Deprecated: Use #URI(String, boolean)
Construct a URI from the given string.URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
An URI can be placed within double-quotes or angle brackets like "http://test.com/" and <http://test.com/>
Parameters: original the string to be represented to URI character sequence It is one of absoluteURI and relativeURI.
Throws: URIException If the URI cannot be created.
See Also: URI
URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ] absoluteURI = scheme ":" ( hier_part | opaque_part ) opaque_part = uric_no_slash *uric
It's for absolute URI = <scheme>:<scheme-specific-part># <fragment>.
Parameters: scheme the scheme string schemeSpecificPart scheme_specific_part fragment the fragment string
Throws: URIException If the URI cannot be created.
See Also: URI
URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ] absoluteURI = scheme ":" ( hier_part | opaque_part ) relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ] hier_part = ( net_path | abs_path ) [ "?" query ]
It's for absolute URI = <scheme>:<path>?<query>#< fragment> and relative URI = <path>?<query>#<fragment >.
Parameters: scheme the scheme string authority the authority string path the path string query the query string fragment the fragment string
Throws: URIException If the new URI cannot be created.
See Also: URI
Parameters: scheme the scheme string userinfo the userinfo string host the host string port the port number
Throws: URIException If the new URI cannot be created.
See Also: URI
Parameters: scheme the scheme string userinfo the userinfo string host the host string port the port number path the path string
Throws: URIException If the new URI cannot be created.
See Also: URI
Parameters: scheme the scheme string userinfo the userinfo string host the host string port the port number path the path string query the query string
Throws: URIException If the new URI cannot be created.
See Also: URI
Parameters: scheme the scheme string userinfo the userinfo string host the host string port the port number path the path string query the query string fragment the fragment string
Throws: URIException If the new URI cannot be created.
See Also: URI
Parameters: scheme the scheme string host the host string path the path string fragment the fragment string
Throws: URIException If the new URI cannot be created.
See Also: URI
Deprecated: Use #URI(URI, String, boolean)
Construct a general URI with the given relative URI string.Parameters: base the base URI relative the relative URI string
Throws: URIException If the new URI cannot be created.
Parameters: base the base URI relative the relative URI string escaped true if URI character sequence is in escaped form. false otherwise.
Throws: URIException If the new URI cannot be created.
Since: 3.0
URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ] relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ]
Resolving Relative References to Absolute Form. Examples of Resolving Relative URI References Within an object with a well-defined base URI of
http://a/b/c/d;p?q
the relative URI would be resolved as follows: Normal Examples
g:h = g:h g = http://a/b/c/g ./g = http://a/b/c/g g/ = http://a/b/c/g/ /g = http://a/g //g = http://g ?y = http://a/b/c/?y g?y = http://a/b/c/g?y #s = (current document)#s g#s = http://a/b/c/g#s g?y#s = http://a/b/c/g?y#s ;x = http://a/b/c/;x g;x = http://a/b/c/g;x g;x?y#s = http://a/b/c/g;x?y#s . = http://a/b/c/ ./ = http://a/b/c/ .. = http://a/b/ ../ = http://a/b/ ../g = http://a/b/g ../.. = http://a/ ../../ = http://a/ ../../g = http://a/g
Some URI schemes do not allow a hierarchical syntax matching the
Parameters: base the base URI relative the relative URI Throws: URIException If the new URI cannot be created.
String
.
To copy the identical URI
object including the userinfo
component, it should be used.
Returns: a clone of this instance
Parameters: obj the object to be compared.
Returns: 0, if it's same, -1, if failed, first being compared with in the authority component
Throws: ClassCastException not URI argument
URI character sequence->octet sequence->original character sequence
A URI must be separated into its components before the escaped characters within those components can be allowedly decoded.
Notice that there is a chance that URI characters that are non UTF-8 may be parsed as valid UTF-8. A recent non-scientific analysis found that EUC encoded Japanese words had a 2.7% false reading; SJIS had a 0.0005% false reading; other encoding such as ASCII or KOI-8 have a 0% false reading.
The percent "%" character always has the reserved purpose of being the escape indicator, it must be escaped as "%25" in order to be used as data within a URI.
The unescape method is internally performed within this method.
Parameters: component the URI character sequence charset the protocol charset
Returns: original character sequence
Throws: URIException incomplete trailing escape pattern or unsupported character encoding
URI character sequence->octet sequence->original character sequence
A URI must be separated into its components before the escaped characters within those components can be allowedly decoded.
Notice that there is a chance that URI characters that are non UTF-8 may be parsed as valid UTF-8. A recent non-scientific analysis found that EUC encoded Japanese words had a 2.7% false reading; SJIS had a 0.0005% false reading; other encoding such as ASCII or KOI-8 have a 0% false reading.
The percent "%" character always has the reserved purpose of being the escape indicator, it must be escaped as "%25" in order to be used as data within a URI.
The unescape method is internally performed within this method.
Parameters: component the URI character sequence charset the protocol charset
Returns: original character sequence
Throws: URIException incomplete trailing escape pattern or unsupported character encoding
Since: 3.0
original character sequence->octet sequence->URI character sequence
An escaped octet is encoded as a character triplet, consisting of the percent character "%" followed by the two hexadecimal digits representing the octet code. For example, "%20" is the escaped encoding for the US-ASCII space character.
Conversion from the local filesystem character set to UTF-8 will normally involve a two step process. First convert the local character set to the UCS; then convert the UCS to UTF-8. The first step in the process can be performed by maintaining a mapping table that includes the local character set code and the corresponding UCS code. The next step is to convert the UCS character code to the UTF-8 encoding.
Mapping between vendor codepages can be done in a very similar manner as described above.
The only time escape encodings can allowedly be made is when a URI is being created from its component parts. The escape and validate methods are internally performed within this method.
Parameters: original the original character sequence allowed those characters that are allowed within a component charset the protocol charset
Returns: URI character sequence
Throws: URIException null component or unsupported character encoding
Parameters: first the first character array second the second character array
Returns: true if they're equal
Parameters: obj an object to compare
Returns: true if two URI objects are equal
Returns: the above hierarchy level
Throws: URIException If (char[])
fails.
See Also: URI
Returns: the authority
Throws: URIException If URI fails
Returns: the current hierarchy level
Throws: URIException If (char[])
fails.
See Also: URI
Returns: the default charset string
Returns: the default charset string by locale
Returns: the default charset string by platform
An individual URI scheme may require a single charset, define a default charset, or provide a way to indicate the charset used.
To work globally either requires support of a number of character sets and to be able to convert between them, or the use of a single preferred character set. For support of global compatibility it is STRONGLY RECOMMENDED that clients and servers use UTF-8 encoding when exchanging URIs.
Returns: the default charset string
Returns: the raw above hierarchy level
Throws: URIException If (char[])
fails.
Returns: the escaped authority
Returns: the escaped current hierarchy level
Throws: URIException If (char[])
fails.
Returns: the escaped fragment string
Returns: the escaped basename string
path = [ abs_path | opaque_part ] abs_path = "/" path_segments opaque_part = uric_no_slash *uric
Returns: the escaped path string
Returns: the escaped path and query string
Returns: the escaped query string
Returns: the escaped URI string
Returns: the escaped URI reference string
Returns: the escaped userinfo
See Also: URI
Returns: the fragment string
Throws: URIException incomplete trailing escape pattern or unsupported character encoding
See Also: URI
host = hostname | IPv4address | IPv6reference
Returns: the host
Throws: URIException If URI fails
See Also: URI
Returns: the basename string
Throws: URIException incomplete trailing escape pattern or unsupported character encoding
See Also: URI
path = [ abs_path | opaque_part ]
Returns: the path string
Throws: URIException If URI fails.
See Also: URI
Returns: the path and query string.
Throws: URIException incomplete trailing escape pattern or unsupported character encoding
See Also: URI
Returns: the port if -1, it has the default port for the scheme or the server-based naming authority is not supported in the specific URI.
Returns: the protocol charset string
See Also: URI
Returns: the query string.
Throws: URIException incomplete trailing escape pattern or unsupported character encoding
See Also: URI
Returns: the raw above hierarchy level
Throws: URIException If (char[])
fails.
Returns: the raw-escaped authority
Parameters: path the path
Returns: the current hierarchy level
Throws: URIException no hierarchy level
Returns: the raw-escaped current hierarchy level
Throws: URIException If (char[])
fails.
The optional fragment identifier is not part of a URI, but is often used in conjunction with a URI.
The format and interpretation of fragment identifiers is dependent on the media type [RFC2046] of the retrieval result.
A fragment identifier is only meaningful when a URI reference is intended for retrieval and the result of that retrieval is a document for which the identified fragment is consistently defined.
Returns: the raw-escaped fragment
Returns: the raw-escaped basename
path = [ abs_path | opaque_part ]
Returns: the raw-escaped path
Returns: the raw-escaped path and query
Returns: the raw-escaped query
Returns: the scheme
It is clearly unwise to use a URL that contains a password which is intended to be secret. In particular, the use of a password within the 'userinfo' component of a URL is strongly disrecommended except in those rare cases where the 'password' parameter is intended to be public.
When you want to get each part of the userinfo, you need to use the specific methods in the specific URL. It depends on the specific URL.
Returns: the URI character sequence
Returns: the URI reference character sequence
Returns: the raw-escaped userinfo
See Also: URI
Returns: the scheme null if undefined scheme
Returns: the original URI string
Throws: URIException incomplete trailing escape pattern or unsupported character encoding
See Also: URI
Returns: the original URI reference string
Throws: URIException If URI fails.
Returns: true iif this URI has authority
See Also: URI
Returns: true iif this URI has fragment
Returns: a has code value for this URI
Returns: true iif this URI has query
Returns: true iif this URI has userinfo
Parameters: s the string to be indexed delims the delimiters used to index
Returns: the earlier index if there are delimiters
Parameters: s the string to be indexed delims the delimiters used to index offset the from index
Returns: the earlier index if there are delimiters
Parameters: s the character array to be indexed delim the delimiter used to index
Returns: the ealier index if there are a delimiter
Parameters: s the character array to be indexed delim the delimiter used to index offset The offset.
Returns: the ealier index if there is a delimiter
Returns: true iif this URI is absoluteURI
Returns: true iif the relativeURI or hier_part is abs_path
Returns: true iif the absoluteURI is hier_part
Returns: true iif the host part is hostname
Returns: true iif the host part is IPv4address
Returns: true iif the host part is IPv6reference
Returns: true iif the relativeURI or heir_part is net_path
See Also: URI
Returns: true iif the absoluteURI is opaque_part
Returns: true iif the authority component is reg_name
Returns: true iif this URI is relativeURI
Returns: true iif the relativeURI is rel_path
Returns: true iif the authority component is server
Algorithm taken from URI reference parser at http://www.apache.org/~fielding/uri/rev-2002/issues.html.
Parameters: path the path to normalize
Returns: the normalized path
Throws: URIException no more higher path level to be normalized
Throws: URIException no more higher path level to be normalized
See Also: isAbsPath
Parameters: original the original character sequence of authority component escaped true
if original
is escaped
Throws: URIException If an error occurs.
String
with the character
encoding of the local system or the document.
The following line is the regular expression for breaking-down a URI reference into its components.
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))? 12 3 4 5 6 7 8 9
For example, matching the above expression to http://jakarta.apache.org/ietf/uri/#Related results in the following subexpression matches:
$1 = http: scheme = $2 = http $3 = //jakarta.apache.org authority = $4 = jakarta.apache.org path = $5 = /ietf/uri/ $6 =query = $7 = $8 = #Related fragment = $9 = Related
Parameters: original the original character sequence escaped true
if original
is escaped
Throws: URIException If an error occurs.
Parameters: component the component string within the component disallowed those characters disallowed within the component
Returns: if true, it doesn't have the disallowed characters if false, the component is undefined or an incorrect one
Parameters: ois the object-input stream
Throws: ClassNotFoundException If one of the classes specified in the input stream cannot be found. IOException If an IO problem occurs.
Parameters: component the component that a fragment may be included
Returns: the component that the fragment identifier is removed
Parameters: basePath a character array of the basePath relPath a character array of the relPath
Returns: the resolved path
Throws: URIException no more higher path level to be resolved
Notice that it will be possible to contain mixed characters (e.g. ftp://host/KoreanNamespace/ChineseResource). To handle the Bi-directional display of these character sets, the protocol charset could be simply used again. Because it's not yet implemented that the insertion of BIDI control characters at different points during composition is extracted.
Always all the time, the setter method is always succeeded and throws
DefaultCharsetChanged
exception.
So API programmer must follow the following way:
The API programmer is responsible to set the correct charset.
And each application should remember its own charset to support.
import org.apache.util.URI$DefaultCharsetChanged;
.
.
.
try {
URI.setDefaultDocumentCharset("EUC-KR");
} catch (DefaultCharsetChanged cc) {
// CASE 1: the exception could be ignored, when it is set by user
if (cc.getReasonCode() == DefaultCharsetChanged.DOCUMENT_CHARSET) {
// CASE 2: let user know the default document charset changed
} else {
// CASE 2: let user know the default protocol charset changed
}
}
Parameters: charset the default charset for the document
Throws: DefaultCharsetChanged default charset changed
The character set used to store files SHALL remain a local decision and MAY depend on the capability of local operating systems. Prior to the exchange of URIs they SHOULD be converted into a ISO/IEC 10646 format and UTF-8 encoded. This approach, while allowing international exchange of URIs, will still allow backward compatibility with older systems because the code set positions for ASCII characters are identical to the one byte sequence in UTF-8.
An individual URI scheme may require a single charset, define a default charset, or provide a way to indicate the charset used.
Always all the time, the setter method is always succeeded and throws
DefaultCharsetChanged
exception.
So API programmer must follow the following way:
The API programmer is responsible to set the correct charset.
And each application should remember its own charset to support.
import org.apache.util.URI$DefaultCharsetChanged;
.
.
.
try {
URI.setDefaultProtocolCharset("UTF-8");
} catch (DefaultCharsetChanged cc) {
// CASE 1: the exception could be ignored, when it is set by user
if (cc.getReasonCode() == DefaultCharsetChanged.PROTOCOL_CHARSET) {
// CASE 2: let user know the default protocol charset changed
} else {
// CASE 2: let user know the default document charset changed
}
}
Parameters: charset the default charset for each protocol
Throws: DefaultCharsetChanged default charset changed
Parameters: escapedAuthority the escaped authority string
Throws: URIException If URI fails
Parameters: escapedFragment the escaped fragment string
Throws: URIException escaped fragment not valid
Parameters: escapedPath the escaped path string
Throws: URIException encoding error or not proper for initial instance
See Also: URI
Parameters: escapedQuery the escaped query string
Throws: URIException escaped query not valid
Parameters: fragment the fragment string.
Throws: URIException If an error occurs.
Parameters: path the path string
Throws: URIException set incorrectly or fragment only
See Also: URI
When a query string is not misunderstood the reserved special characters ("&", "=", "+", ",", and "$") within a query component, it is recommended to use in encoding the whole query with this method.
The additional APIs for the special purpose using by the reserved
special characters used in each protocol are implemented in each protocol
classes inherited from URI
. So refer to the same-named APIs
implemented in each specific protocol instance.
Parameters: query the query string.
Throws: URIException incomplete trailing escape pattern or unsupported character encoding
See Also: URI
authority = server | reg_name
Parameters: escapedAuthority the raw escaped authority
Throws: URIException If URI fails NullPointerException null authority
Parameters: escapedFragment the raw-escaped fragment
Throws: URIException escaped fragment not valid
Parameters: escapedPath the path character sequence
Throws: URIException encoding error or not proper for initial instance
See Also: URI
Parameters: escapedQuery the raw-escaped query
Throws: URIException escaped query not valid
See Also: URI
On the document, the URI-reference form is only used without the userinfo component like http://jakarta.apache.org/ by the security reason. But the URI-reference form with the userinfo component could be parsed.
In other words, this URI and any its subclasses must not expose the
URI-reference expression with the userinfo component like
http://user:password@hostport/restricted_zone.
It means that the API client programmer should extract each user and
password to access manually. Probably it will be supported in the each
subclass, however, not a whole URI-reference expression.
Returns: the escaped URI string
See Also: clone
Parameters: component the characters sequence within the component generous those characters that are allowed within a component
Returns: if true, it's the correct URI character sequence
It's not that much strict, generous. The strict validation might be performed before being called this method.
Parameters: component the characters sequence within the component soffset the starting offset of the given component eoffset the ending offset of the given component if -1, it means the length of the component generous those characters that are allowed within a component
Returns: if true, it's the correct URI character sequence
Parameters: oos the object-output stream
Throws: IOException If an IO problem occurs.