Frames | No Frames |
1: /* URI.java -- An URI class 2: Copyright (C) 2002, 2004, 2005, 2006 Free Software Foundation, Inc. 3: 4: This file is part of GNU Classpath. 5: 6: GNU Classpath is free software; you can redistribute it and/or modify 7: it under the terms of the GNU General Public License as published by 8: the Free Software Foundation; either version 2, or (at your option) 9: any later version. 10: 11: GNU Classpath is distributed in the hope that it will be useful, but 12: WITHOUT ANY WARRANTY; without even the implied warranty of 13: MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 14: General Public License for more details. 15: 16: You should have received a copy of the GNU General Public License 17: along with GNU Classpath; see the file COPYING. If not, write to the 18: Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 19: 02110-1301 USA. 20: 21: Linking this library statically or dynamically with other modules is 22: making a combined work based on this library. Thus, the terms and 23: conditions of the GNU General Public License cover the whole 24: combination. 25: 26: As a special exception, the copyright holders of this library give you 27: permission to link this library with independent modules to produce an 28: executable, regardless of the license terms of these independent 29: modules, and to copy and distribute the resulting executable under 30: terms of your choice, provided that you also meet, for each linked 31: independent module, the terms and conditions of the license of that 32: module. An independent module is a module which is not derived from 33: or based on this library. If you modify this library, you may extend 34: this exception to your version of the library, but you are not 35: obligated to do so. If you do not wish to do so, delete this 36: exception statement from your version. */ 37: 38: 39: package java.net; 40: 41: import java.io.IOException; 42: import java.io.ObjectInputStream; 43: import java.io.ObjectOutputStream; 44: import java.io.Serializable; 45: import java.util.regex.Matcher; 46: import java.util.regex.Pattern; 47: 48: /** 49: * <p> 50: * A URI instance represents that defined by 51: * <a href="http://www.ietf.org/rfc/rfc3986.txt">RFC3986</a>, 52: * with some deviations. 53: * </p> 54: * <p> 55: * At its highest level, a URI consists of: 56: * </p> 57: * <code>[<em>scheme</em><strong>:</strong>]<em>scheme-specific-part</em> 58: * [<strong>#</strong><em>fragment</em>]</code> 59: * </p> 60: * <p> 61: * where <strong>#</strong> and <strong>:</strong> are literal characters, 62: * and those parts enclosed in square brackets are optional. 63: * </p> 64: * <p> 65: * There are two main types of URI. An <em>opaque</em> URI is one 66: * which just consists of the above three parts, and is not further 67: * defined. An example of such a URI would be <em>mailto:</em> URI. 68: * In contrast, <em>hierarchical</em> URIs give further definition 69: * to the scheme-specific part, so as represent some part of a hierarchical 70: * structure. 71: * </p> 72: * <p> 73: * <code>[<strong>//</strong><em>authority</em>][<em>path</em>] 74: * [<strong>?</strong><em>query</em>]</code> 75: * </p> 76: * <p> 77: * with <strong>/</strong> and <strong>?</strong> being literal characters. 78: * When server-based, the authority section is further subdivided into: 79: * </p> 80: * <p> 81: * <code>[<em>user-info</em><strong>@</strong>]<em>host</em> 82: * [<strong>:</strong><em>port</em>]</code> 83: * </p> 84: * <p> 85: * with <strong>@</strong> and <strong>:</strong> as literal characters. 86: * Authority sections that are not server-based are said to be registry-based. 87: * </p> 88: * <p> 89: * Hierarchical URIs can be either relative or absolute. Absolute URIs 90: * always start with a `<strong>/</strong>', while relative URIs don't 91: * specify a scheme. Opaque URIs are always absolute. 92: * </p> 93: * <p> 94: * Each part of the URI may have one of three states: undefined, empty 95: * or containing some content. The former two of these are represented 96: * by <code>null</code> and the empty string in Java, respectively. 97: * The scheme-specific part may never be undefined. It also follows from 98: * this that the path sub-part may also not be undefined, so as to ensure 99: * the former. 100: * </p> 101: * <h2>Character Escaping and Quoting</h2> 102: * <p> 103: * The characters that can be used within a valid URI are restricted. 104: * There are two main classes of characters which can't be used as is 105: * within the URI: 106: * </p> 107: * <ol> 108: * <li><strong>Characters outside the US-ASCII character set</strong>. 109: * These have to be <strong>escaped</strong> in order to create 110: * an RFC-compliant URI; this means replacing the character with the 111: * appropriate hexadecimal value, preceded by a `%'.</li> 112: * <li><strong>Illegal characters</strong> (e.g. space characters, 113: * control characters) are quoted, which results in them being encoded 114: * in the same way as non-US-ASCII characters.</li> 115: * </ol> 116: * <p> 117: * The set of valid characters differs depending on the section of the URI: 118: * </p> 119: * <ul> 120: * <li><strong>Scheme</strong>: Must be an alphanumeric, `-', `.' or '+'.</li> 121: * <li><strong>Authority</strong>:Composed of the username, host, port, `@' 122: * and `:'.</li> 123: * <li><strong>Username</strong>: Allows unreserved or percent-encoded 124: * characters, sub-delimiters and `:'.</li> 125: * <li><strong>Host</strong>: Allows unreserved or percent-encoded 126: * characters, sub-delimiters and square brackets (`[' and `]') for IPv6 127: * addresses.</li> 128: * <li><strong>Port</strong>: Digits only.</li> 129: * <li><strong>Path</strong>: Allows the path characters and `/'. 130: * <li><strong>Query</strong>: Allows the path characters, `?' and '/'. 131: * <li><strong>Fragment</strong>: Allows the path characters, `?' and '/'. 132: * </ul> 133: * <p> 134: * These definitions reference the following sets of characters: 135: * </p> 136: * <ul> 137: * <li><strong>Unreserved characters</strong>: The alphanumerics plus 138: * `-', `.', `_', and `~'.</li> 139: * <li><strong>Sub-delimiters</strong>: `!', `$', `&', `(', `)', `*', 140: * `+', `,', `;', `=' and the single-quote itself.</li> 141: * <li><strong>Path characters</strong>: Unreserved and percent-encoded 142: * characters and the sub-delimiters along with `@' and `:'.</li> 143: * </ul> 144: * <p> 145: * The constructors and accessor methods allow the use and retrieval of 146: * URI components which contain non-US-ASCII characters directly. 147: * They are only escaped when the <code>toASCIIString()</code> method 148: * is used. In contrast, illegal characters are always quoted, with the 149: * exception of the return values of the non-raw accessors. 150: * </p> 151: * 152: * @author Ito Kazumitsu (ito.kazumitsu@hitachi-cable.co.jp) 153: * @author Dalibor Topic (robilad@kaffe.org) 154: * @author Michael Koch (konqueror@gmx.de) 155: * @author Andrew John Hughes (gnu_andrew@member.fsf.org) 156: * @since 1.4 157: */ 158: public final class URI 159: implements Comparable, Serializable 160: { 161: /** 162: * For serialization compatability. 163: */ 164: static final long serialVersionUID = -6052424284110960213L; 165: 166: /** 167: * Regular expression for parsing URIs. 168: * 169: * Taken from RFC 2396, Appendix B. 170: * This expression doesn't parse IPv6 addresses. 171: */ 172: private static final String URI_REGEXP = 173: "^(([^:/?#]+):)?((//([^/?#]*))?([^?#]*)(\\?([^#]*))?)?(#(.*))?"; 174: 175: /** 176: * Regular expression for parsing the authority segment. 177: */ 178: private static final String AUTHORITY_REGEXP = 179: "(([^?#]*)@)?([^?#:]*)(:([0-9]*))?"; 180: 181: /** 182: * Valid characters (taken from rfc2396/3986) 183: */ 184: private static final String RFC2396_DIGIT = "0123456789"; 185: private static final String RFC2396_LOWALPHA = "abcdefghijklmnopqrstuvwxyz"; 186: private static final String RFC2396_UPALPHA = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; 187: private static final String RFC2396_ALPHA = 188: RFC2396_LOWALPHA + RFC2396_UPALPHA; 189: private static final String RFC2396_ALPHANUM = RFC2396_DIGIT + RFC2396_ALPHA; 190: private static final String RFC3986_UNRESERVED = RFC2396_ALPHANUM + "-._~"; 191: private static final String RFC3986_SUBDELIMS = "!$&'()*+,;="; 192: private static final String RFC3986_REG_NAME = 193: RFC3986_UNRESERVED + RFC3986_SUBDELIMS + "%"; 194: private static final String RFC3986_PCHAR = RFC3986_UNRESERVED + 195: RFC3986_SUBDELIMS + ":@%"; 196: private static final String RFC3986_SEGMENT = RFC3986_PCHAR; 197: private static final String RFC3986_PATH_SEGMENTS = RFC3986_SEGMENT + "/"; 198: private static final String RFC3986_SSP = RFC3986_PCHAR + "?/"; 199: private static final String RFC3986_HOST = RFC3986_REG_NAME + "[]"; 200: private static final String RFC3986_USERINFO = RFC3986_REG_NAME + ":"; 201: 202: /** 203: * Index of scheme component in parsed URI. 204: */ 205: private static final int SCHEME_GROUP = 2; 206: 207: /** 208: * Index of scheme-specific-part in parsed URI. 209: */ 210: private static final int SCHEME_SPEC_PART_GROUP = 3; 211: 212: /** 213: * Index of authority component in parsed URI. 214: */ 215: private static final int AUTHORITY_GROUP = 5; 216: 217: /** 218: * Index of path component in parsed URI. 219: */ 220: private static final int PATH_GROUP = 6; 221: 222: /** 223: * Index of query component in parsed URI. 224: */ 225: private static final int QUERY_GROUP = 8; 226: 227: /** 228: * Index of fragment component in parsed URI. 229: */ 230: private static final int FRAGMENT_GROUP = 10; 231: 232: /** 233: * Index of userinfo component in parsed authority section. 234: */ 235: private static final int AUTHORITY_USERINFO_GROUP = 2; 236: 237: /** 238: * Index of host component in parsed authority section. 239: */ 240: private static final int AUTHORITY_HOST_GROUP = 3; 241: 242: /** 243: * Index of port component in parsed authority section. 244: */ 245: private static final int AUTHORITY_PORT_GROUP = 5; 246: 247: /** 248: * The compiled version of the URI regular expression. 249: */ 250: private static final Pattern URI_PATTERN; 251: 252: /** 253: * The compiled version of the authority regular expression. 254: */ 255: private static final Pattern AUTHORITY_PATTERN; 256: 257: /** 258: * The set of valid hexadecimal characters. 259: */ 260: private static final String HEX = "0123456789ABCDEF"; 261: 262: private transient String scheme; 263: private transient String rawSchemeSpecificPart; 264: private transient String schemeSpecificPart; 265: private transient String rawAuthority; 266: private transient String authority; 267: private transient String rawUserInfo; 268: private transient String userInfo; 269: private transient String rawHost; 270: private transient String host; 271: private transient int port = -1; 272: private transient String rawPath; 273: private transient String path; 274: private transient String rawQuery; 275: private transient String query; 276: private transient String rawFragment; 277: private transient String fragment; 278: private String string; 279: 280: /** 281: * Static initializer to pre-compile the regular expressions. 282: */ 283: static 284: { 285: URI_PATTERN = Pattern.compile(URI_REGEXP); 286: AUTHORITY_PATTERN = Pattern.compile(AUTHORITY_REGEXP); 287: } 288: 289: private void readObject(ObjectInputStream is) 290: throws ClassNotFoundException, IOException 291: { 292: this.string = (String) is.readObject(); 293: try 294: { 295: parseURI(this.string); 296: } 297: catch (URISyntaxException x) 298: { 299: // Should not happen. 300: throw new RuntimeException(x); 301: } 302: } 303: 304: private void writeObject(ObjectOutputStream os) throws IOException 305: { 306: if (string == null) 307: string = toString(); 308: os.writeObject(string); 309: } 310: 311: /** 312: * <p> 313: * Returns the string content of the specified group of the supplied 314: * matcher. The returned value is modified according to the following: 315: * </p> 316: * <ul> 317: * <li>If the resulting string has a length greater than 0, then 318: * that string is returned.</li> 319: * <li>If a string of zero length, is matched, then the content 320: * of the preceding group is considered. If this is also an empty 321: * string, then <code>null</code> is returned to indicate an undefined 322: * value. Otherwise, the value is truly the empty string and this is 323: * the returned value.</li> 324: * </ul> 325: * <p> 326: * This method is used for matching against all parts of the URI 327: * that may be either undefined or empty (i.e. all those but the 328: * scheme-specific part and the path). In each case, the preceding 329: * group is the content of the original group, along with some 330: * additional distinguishing feature. For example, the preceding 331: * group for the query includes the preceding question mark, 332: * while that of the fragment includes the hash symbol. The presence 333: * of these features enables disambiguation between the two cases 334: * of a completely unspecified value and a simple non-existant value. 335: * The scheme differs in that it will never return an empty string; 336: * the delimiter follows the scheme rather than preceding it, so 337: * it becomes part of the following section. The same is true 338: * of the user information. 339: * </p> 340: * 341: * @param match the matcher, which contains the results of the URI 342: * matched against the URI regular expression. 343: * @return either the matched content, <code>null</code> for undefined 344: * values, or an empty string for a URI part with empty content. 345: */ 346: private static String getURIGroup(Matcher match, int group) 347: { 348: String matched = match.group(group); 349: if (matched == null || matched.length() == 0) 350: { 351: String prevMatched = match.group(group -1); 352: if (prevMatched == null || prevMatched.length() == 0) 353: return null; 354: else 355: return ""; 356: } 357: return matched; 358: } 359: 360: /** 361: * Sets fields of this URI by parsing the given string. 362: * 363: * @param str The string to parse 364: * 365: * @exception URISyntaxException If the given string violates RFC 2396 366: */ 367: private void parseURI(String str) throws URISyntaxException 368: { 369: Matcher matcher = URI_PATTERN.matcher(str); 370: 371: if (matcher.matches()) 372: { 373: scheme = getURIGroup(matcher, SCHEME_GROUP); 374: rawSchemeSpecificPart = matcher.group(SCHEME_SPEC_PART_GROUP); 375: schemeSpecificPart = unquote(rawSchemeSpecificPart); 376: if (!isOpaque()) 377: { 378: rawAuthority = getURIGroup(matcher, AUTHORITY_GROUP); 379: rawPath = matcher.group(PATH_GROUP); 380: rawQuery = getURIGroup(matcher, QUERY_GROUP); 381: } 382: rawFragment = getURIGroup(matcher, FRAGMENT_GROUP); 383: } 384: else 385: throw new URISyntaxException(str, 386: "doesn't match URI regular expression"); 387: parseServerAuthority(); 388: 389: // We must eagerly unquote the parts, because this is the only time 390: // we may throw an exception. 391: authority = unquote(rawAuthority); 392: userInfo = unquote(rawUserInfo); 393: host = unquote(rawHost); 394: path = unquote(rawPath); 395: query = unquote(rawQuery); 396: fragment = unquote(rawFragment); 397: } 398: 399: /** 400: * Unquote "%" + hex quotes characters 401: * 402: * @param str The string to unquote or null. 403: * 404: * @return The unquoted string or null if str was null. 405: * 406: * @exception URISyntaxException If the given string contains invalid 407: * escape sequences. 408: */ 409: private static String unquote(String str) throws URISyntaxException 410: { 411: if (str == null) 412: return null; 413: byte[] buf = new byte[str.length()]; 414: int pos = 0; 415: for (int i = 0; i < str.length(); i++) 416: { 417: char c = str.charAt(i); 418: if (c == '%') 419: { 420: if (i + 2 >= str.length()) 421: throw new URISyntaxException(str, "Invalid quoted character"); 422: int hi = Character.digit(str.charAt(++i), 16); 423: int lo = Character.digit(str.charAt(++i), 16); 424: if (lo < 0 || hi < 0) 425: throw new URISyntaxException(str, "Invalid quoted character"); 426: buf[pos++] = (byte) (hi * 16 + lo); 427: } 428: else 429: buf[pos++] = (byte) c; 430: } 431: try 432: { 433: return new String(buf, 0, pos, "utf-8"); 434: } 435: catch (java.io.UnsupportedEncodingException x2) 436: { 437: throw (Error) new InternalError().initCause(x2); 438: } 439: } 440: 441: /** 442: * Quote characters illegal in URIs in given string. 443: * 444: * Replace illegal characters by encoding their UTF-8 445: * representation as "%" + hex code for each resulting 446: * UTF-8 character. 447: * 448: * @param str The string to quote 449: * 450: * @return The quoted string. 451: */ 452: private static String quote(String str) 453: { 454: return quote(str, RFC3986_SSP); 455: } 456: 457: /** 458: * Quote characters illegal in URI authorities in given string. 459: * 460: * Replace illegal characters by encoding their UTF-8 461: * representation as "%" + hex code for each resulting 462: * UTF-8 character. 463: * 464: * @param str The string to quote 465: * 466: * @return The quoted string. 467: */ 468: private static String quoteAuthority(String str) 469: { 470: // Technically, we should be using RFC2396_AUTHORITY, but 471: // it contains no additional characters. 472: return quote(str, RFC3986_REG_NAME); 473: } 474: 475: /** 476: * Quotes the characters in the supplied string that are not part of 477: * the specified set of legal characters. 478: * 479: * @param str the string to quote 480: * @param legalCharacters the set of legal characters 481: * 482: * @return the quoted string. 483: */ 484: private static String quote(String str, String legalCharacters) 485: { 486: StringBuffer sb = new StringBuffer(str.length()); 487: for (int i = 0; i < str.length(); i++) 488: { 489: char c = str.charAt(i); 490: if (legalCharacters.indexOf(c) == -1) 491: { 492: if (c <= 127) 493: { 494: sb.append('%'); 495: sb.append(HEX.charAt(c / 16)); 496: sb.append(HEX.charAt(c % 16)); 497: } 498: } 499: else 500: sb.append(c); 501: } 502: return sb.toString(); 503: } 504: 505: /** 506: * Quote characters illegal in URI hosts in given string. 507: * 508: * Replace illegal characters by encoding their UTF-8 509: * representation as "%" + hex code for each resulting 510: * UTF-8 character. 511: * 512: * @param str The string to quote 513: * 514: * @return The quoted string. 515: */ 516: private static String quoteHost(String str) 517: { 518: return quote(str, RFC3986_HOST); 519: } 520: 521: /** 522: * Quote characters illegal in URI paths in given string. 523: * 524: * Replace illegal characters by encoding their UTF-8 525: * representation as "%" + hex code for each resulting 526: * UTF-8 character. 527: * 528: * @param str The string to quote 529: * 530: * @return The quoted string. 531: */ 532: private static String quotePath(String str) 533: { 534: // Technically, we should be using RFC2396_PATH, but 535: // it contains no additional characters. 536: return quote(str, RFC3986_PATH_SEGMENTS); 537: } 538: 539: /** 540: * Quote characters illegal in URI user infos in given string. 541: * 542: * Replace illegal characters by encoding their UTF-8 543: * representation as "%" + hex code for each resulting 544: * UTF-8 character. 545: * 546: * @param str The string to quote 547: * 548: * @return The quoted string. 549: */ 550: private static String quoteUserInfo(String str) 551: { 552: return quote(str, RFC3986_USERINFO); 553: } 554: 555: /** 556: * Creates an URI from the given string 557: * 558: * @param str The string to create the URI from 559: * 560: * @exception URISyntaxException If the given string violates RFC 2396 561: * @exception NullPointerException If str is null 562: */ 563: public URI(String str) throws URISyntaxException 564: { 565: this.string = str; 566: parseURI(str); 567: } 568: 569: /** 570: * Create an URI from the given components 571: * 572: * @param scheme The scheme name 573: * @param userInfo The username and authorization info 574: * @param host The hostname 575: * @param port The port number 576: * @param path The path 577: * @param query The query 578: * @param fragment The fragment 579: * 580: * @exception URISyntaxException If the given string violates RFC 2396 581: */ 582: public URI(String scheme, String userInfo, String host, int port, 583: String path, String query, String fragment) 584: throws URISyntaxException 585: { 586: this((scheme == null ? "" : scheme + ":") 587: + (userInfo == null && host == null && port == -1 ? "" : "//") 588: + (userInfo == null ? "" : quoteUserInfo(userInfo) + "@") 589: + (host == null ? "" : quoteHost(host)) 590: + (port == -1 ? "" : ":" + String.valueOf(port)) 591: + (path == null ? "" : quotePath(path)) 592: + (query == null ? "" : "?" + quote(query)) 593: + (fragment == null ? "" : "#" + quote(fragment))); 594: } 595: 596: /** 597: * Create an URI from the given components 598: * 599: * @param scheme The scheme name 600: * @param authority The authority 601: * @param path The apth 602: * @param query The query 603: * @param fragment The fragment 604: * 605: * @exception URISyntaxException If the given string violates RFC 2396 606: */ 607: public URI(String scheme, String authority, String path, String query, 608: String fragment) throws URISyntaxException 609: { 610: this((scheme == null ? "" : scheme + ":") 611: + (authority == null ? "" : "//" + quoteAuthority(authority)) 612: + (path == null ? "" : quotePath(path)) 613: + (query == null ? "" : "?" + quote(query)) 614: + (fragment == null ? "" : "#" + quote(fragment))); 615: } 616: 617: /** 618: * Create an URI from the given components 619: * 620: * @param scheme The scheme name 621: * @param host The hostname 622: * @param path The path 623: * @param fragment The fragment 624: * 625: * @exception URISyntaxException If the given string violates RFC 2396 626: */ 627: public URI(String scheme, String host, String path, String fragment) 628: throws URISyntaxException 629: { 630: this(scheme, null, host, -1, path, null, fragment); 631: } 632: 633: /** 634: * Create an URI from the given components 635: * 636: * @param scheme The scheme name 637: * @param ssp The scheme specific part 638: * @param fragment The fragment 639: * 640: * @exception URISyntaxException If the given string violates RFC 2396 641: */ 642: public URI(String scheme, String ssp, String fragment) 643: throws URISyntaxException 644: { 645: this((scheme == null ? "" : scheme + ":") 646: + (ssp == null ? "" : quote(ssp)) 647: + (fragment == null ? "" : "#" + quote(fragment))); 648: } 649: 650: /** 651: * Create an URI from the given string 652: * 653: * @param str The string to create the URI from 654: * 655: * @exception IllegalArgumentException If the given string violates RFC 2396 656: * @exception NullPointerException If str is null 657: */ 658: public static URI create(String str) 659: { 660: try 661: { 662: return new URI(str); 663: } 664: catch (URISyntaxException e) 665: { 666: throw (IllegalArgumentException) new IllegalArgumentException() 667: .initCause(e); 668: } 669: } 670: 671: /** 672: * Attempts to parse this URI's authority component, if defined, 673: * into user-information, host, and port components. The purpose 674: * of this method was to disambiguate between some authority sections, 675: * which form invalid server-based authories, but valid registry 676: * based authorities. In the updated RFC 3986, the authority section 677: * is defined differently, with registry-based authorities part of 678: * the host section. Thus, this method is now simply an explicit 679: * way of parsing any authority section. 680: * 681: * @return the URI, with the authority section parsed into user 682: * information, host and port components. 683: * @throws URISyntaxException if the given string violates RFC 2396 684: */ 685: public URI parseServerAuthority() throws URISyntaxException 686: { 687: if (rawAuthority != null) 688: { 689: Matcher matcher = AUTHORITY_PATTERN.matcher(rawAuthority); 690: 691: if (matcher.matches()) 692: { 693: rawUserInfo = getURIGroup(matcher, AUTHORITY_USERINFO_GROUP); 694: rawHost = getURIGroup(matcher, AUTHORITY_HOST_GROUP); 695: 696: String portStr = getURIGroup(matcher, AUTHORITY_PORT_GROUP); 697: 698: if (portStr != null) 699: try 700: { 701: port = Integer.parseInt(portStr); 702: } 703: catch (NumberFormatException e) 704: { 705: URISyntaxException use = 706: new URISyntaxException 707: (string, "doesn't match URI regular expression"); 708: use.initCause(e); 709: throw use; 710: } 711: } 712: else 713: throw new URISyntaxException(string, 714: "doesn't match URI regular expression"); 715: } 716: return this; 717: } 718: 719: /** 720: * <p> 721: * Returns a normalized version of the URI. If the URI is opaque, 722: * or its path is already in normal form, then this URI is simply 723: * returned. Otherwise, the following transformation of the path 724: * element takes place: 725: * </p> 726: * <ol> 727: * <li>All `.' segments are removed.</li> 728: * <li>Each `..' segment which can be paired with a prior non-`..' segment 729: * is removed along with the preceding segment.</li> 730: * <li>A `.' segment is added to the front if the first segment contains 731: * a colon (`:'). This is a deviation from the RFC, which prevents 732: * confusion between the path and the scheme.</li> 733: * </ol> 734: * <p> 735: * The resulting URI will be free of `.' and `..' segments, barring those 736: * that were prepended or which couldn't be paired, respectively. 737: * </p> 738: * 739: * @return the normalized URI. 740: */ 741: public URI normalize() 742: { 743: if (isOpaque() || path.indexOf("/./") == -1 && path.indexOf("/../") == -1) 744: return this; 745: try 746: { 747: return new URI(scheme, authority, normalizePath(path), query, 748: fragment); 749: } 750: catch (URISyntaxException e) 751: { 752: throw (Error) new InternalError("Normalized URI variant could not "+ 753: "be constructed").initCause(e); 754: } 755: } 756: 757: /** 758: * <p> 759: * Normalize the given path. The following transformation takes place: 760: * </p> 761: * <ol> 762: * <li>All `.' segments are removed.</li> 763: * <li>Each `..' segment which can be paired with a prior non-`..' segment 764: * is removed along with the preceding segment.</li> 765: * <li>A `.' segment is added to the front if the first segment contains 766: * a colon (`:'). This is a deviation from the RFC, which prevents 767: * confusion between the path and the scheme.</li> 768: * </ol> 769: * <p> 770: * The resulting URI will be free of `.' and `..' segments, barring those 771: * that were prepended or which couldn't be paired, respectively. 772: * </p> 773: * 774: * @param relativePath the relative path to be normalized. 775: * @return the normalized path. 776: */ 777: private String normalizePath(String relativePath) 778: { 779: /* 780: This follows the algorithm in section 5.2.4. of RFC3986, 781: but doesn't modify the input buffer. 782: */ 783: StringBuffer input = new StringBuffer(relativePath); 784: StringBuffer output = new StringBuffer(); 785: int start = 0; 786: while (start < input.length()) 787: { 788: /* A */ 789: if (input.indexOf("../",start) == start) 790: { 791: start += 3; 792: continue; 793: } 794: if (input.indexOf("./",start) == start) 795: { 796: start += 2; 797: continue; 798: } 799: /* B */ 800: if (input.indexOf("/./",start) == start) 801: { 802: start += 2; 803: continue; 804: } 805: if (input.indexOf("/.",start) == start 806: && input.charAt(start + 2) != '.') 807: { 808: start += 1; 809: input.setCharAt(start,'/'); 810: continue; 811: } 812: /* C */ 813: if (input.indexOf("/../",start) == start) 814: { 815: start += 3; 816: removeLastSegment(output); 817: continue; 818: } 819: if (input.indexOf("/..",start) == start) 820: { 821: start += 2; 822: input.setCharAt(start,'/'); 823: removeLastSegment(output); 824: continue; 825: } 826: /* D */ 827: if (start == input.length() - 1 && input.indexOf(".",start) == start) 828: { 829: input.delete(0,1); 830: continue; 831: } 832: if (start == input.length() - 2 && input.indexOf("..",start) == start) 833: { 834: input.delete(0,2); 835: continue; 836: } 837: /* E */ 838: int indexOfSlash = input.indexOf("/",start); 839: while (indexOfSlash == start) 840: { 841: output.append("/"); 842: ++start; 843: indexOfSlash = input.indexOf("/",start); 844: } 845: if (indexOfSlash == -1) 846: indexOfSlash = input.length(); 847: output.append(input.substring(start, indexOfSlash)); 848: start = indexOfSlash; 849: } 850: return output.toString(); 851: } 852: 853: /** 854: * Removes the last segment of the path from the specified buffer. 855: * 856: * @param buffer the buffer containing the path. 857: */ 858: private void removeLastSegment(StringBuffer buffer) 859: { 860: int lastSlash = buffer.lastIndexOf("/"); 861: if (lastSlash == -1) 862: buffer.setLength(0); 863: else 864: buffer.setLength(lastSlash); 865: } 866: 867: /** 868: * Resolves the given URI against this URI 869: * 870: * @param uri The URI to resolve against this URI 871: * 872: * @return The resulting URI, or null when it couldn't be resolved 873: * for some reason. 874: * 875: * @throws NullPointerException if uri is null 876: */ 877: public URI resolve(URI uri) 878: { 879: if (uri.isAbsolute()) 880: return uri; 881: if (uri.isOpaque()) 882: return uri; 883: 884: String scheme = uri.getScheme(); 885: String schemeSpecificPart = uri.getSchemeSpecificPart(); 886: String authority = uri.getAuthority(); 887: String path = uri.getPath(); 888: String query = uri.getQuery(); 889: String fragment = uri.getFragment(); 890: 891: try 892: { 893: if (fragment != null && path != null && path.equals("") 894: && scheme == null && authority == null && query == null) 895: return new URI(this.scheme, this.schemeSpecificPart, fragment); 896: 897: if (authority == null) 898: { 899: authority = this.authority; 900: if (path == null) 901: path = ""; 902: if (! (path.startsWith("/"))) 903: { 904: StringBuffer basepath = new StringBuffer(this.path); 905: int i = this.path.lastIndexOf('/'); 906: 907: if (i >= 0) 908: basepath.delete(i + 1, basepath.length()); 909: 910: basepath.append(path); 911: path = normalizePath(basepath.toString()); 912: } 913: } 914: return new URI(this.scheme, authority, path, query, fragment); 915: } 916: catch (URISyntaxException e) 917: { 918: throw (Error) new InternalError("Resolved URI variant could not "+ 919: "be constructed").initCause(e); 920: } 921: } 922: 923: /** 924: * Resolves the given URI string against this URI 925: * 926: * @param str The URI as string to resolve against this URI 927: * 928: * @return The resulting URI 929: * 930: * @throws IllegalArgumentException If the given URI string 931: * violates RFC 2396 932: * @throws NullPointerException If uri is null 933: */ 934: public URI resolve(String str) throws IllegalArgumentException 935: { 936: return resolve(create(str)); 937: } 938: 939: /** 940: * <p> 941: * Relativizes the given URI against this URI. The following 942: * algorithm is used: 943: * </p> 944: * <ul> 945: * <li>If either URI is opaque, the given URI is returned.</li> 946: * <li>If the schemes of the URIs differ, the given URI is returned.</li> 947: * <li>If the authority components of the URIs differ, then the given 948: * URI is returned.</li> 949: * <li>If the path of this URI is not a prefix of the supplied URI, 950: * then the given URI is returned.</li> 951: * <li>If all the above conditions hold, a new URI is created using the 952: * query and fragment components of the given URI, along with a path 953: * computed by removing the path of this URI from the start of the path 954: * of the supplied URI.</li> 955: * </ul> 956: * 957: * @param uri the URI to relativize agsint this URI 958: * @return the resulting URI 959: * @throws NullPointerException if the uri is null 960: */ 961: public URI relativize(URI uri) 962: { 963: if (isOpaque() || uri.isOpaque()) 964: return uri; 965: if (scheme == null && uri.getScheme() != null) 966: return uri; 967: if (scheme != null && !(scheme.equals(uri.getScheme()))) 968: return uri; 969: if (rawAuthority == null && uri.getRawAuthority() != null) 970: return uri; 971: if (rawAuthority != null && !(rawAuthority.equals(uri.getRawAuthority()))) 972: return uri; 973: if (!(uri.getRawPath().startsWith(rawPath))) 974: return uri; 975: try 976: { 977: return new URI(null, null, 978: uri.getRawPath().substring(rawPath.length()), 979: uri.getRawQuery(), uri.getRawFragment()); 980: } 981: catch (URISyntaxException e) 982: { 983: throw (Error) new InternalError("Relativized URI variant could not "+ 984: "be constructed").initCause(e); 985: } 986: } 987: 988: /** 989: * Creates an URL from an URI 990: * 991: * @throws MalformedURLException If a protocol handler for the URL could 992: * not be found, or if some other error occurred while constructing the URL 993: * @throws IllegalArgumentException If the URI is not absolute 994: */ 995: public URL toURL() throws IllegalArgumentException, MalformedURLException 996: { 997: if (isAbsolute()) 998: return new URL(this.toString()); 999: 1000: throw new IllegalArgumentException("not absolute"); 1001: } 1002: 1003: /** 1004: * Returns the scheme of the URI 1005: */ 1006: public String getScheme() 1007: { 1008: return scheme; 1009: } 1010: 1011: /** 1012: * Tells whether this URI is absolute or not 1013: */ 1014: public boolean isAbsolute() 1015: { 1016: return scheme != null; 1017: } 1018: 1019: /** 1020: * Tell whether this URI is opaque or not 1021: */ 1022: public boolean isOpaque() 1023: { 1024: return ((scheme != null) && ! (schemeSpecificPart.startsWith("/"))); 1025: } 1026: 1027: /** 1028: * Returns the raw scheme specific part of this URI. 1029: * The scheme-specific part is never undefined, though it may be empty 1030: */ 1031: public String getRawSchemeSpecificPart() 1032: { 1033: return rawSchemeSpecificPart; 1034: } 1035: 1036: /** 1037: * Returns the decoded scheme specific part of this URI. 1038: */ 1039: public String getSchemeSpecificPart() 1040: { 1041: return schemeSpecificPart; 1042: } 1043: 1044: /** 1045: * Returns the raw authority part of this URI 1046: */ 1047: public String getRawAuthority() 1048: { 1049: return rawAuthority; 1050: } 1051: 1052: /** 1053: * Returns the decoded authority part of this URI 1054: */ 1055: public String getAuthority() 1056: { 1057: return authority; 1058: } 1059: 1060: /** 1061: * Returns the raw user info part of this URI 1062: */ 1063: public String getRawUserInfo() 1064: { 1065: return rawUserInfo; 1066: } 1067: 1068: /** 1069: * Returns the decoded user info part of this URI 1070: */ 1071: public String getUserInfo() 1072: { 1073: return userInfo; 1074: } 1075: 1076: /** 1077: * Returns the hostname of the URI 1078: */ 1079: public String getHost() 1080: { 1081: return host; 1082: } 1083: 1084: /** 1085: * Returns the port number of the URI 1086: */ 1087: public int getPort() 1088: { 1089: return port; 1090: } 1091: 1092: /** 1093: * Returns the raw path part of this URI 1094: */ 1095: public String getRawPath() 1096: { 1097: return rawPath; 1098: } 1099: 1100: /** 1101: * Returns the path of the URI 1102: */ 1103: public String getPath() 1104: { 1105: return path; 1106: } 1107: 1108: /** 1109: * Returns the raw query part of this URI 1110: */ 1111: public String getRawQuery() 1112: { 1113: return rawQuery; 1114: } 1115: 1116: /** 1117: * Returns the query of the URI 1118: */ 1119: public String getQuery() 1120: { 1121: return query; 1122: } 1123: 1124: /** 1125: * Return the raw fragment part of this URI 1126: */ 1127: public String getRawFragment() 1128: { 1129: return rawFragment; 1130: } 1131: 1132: /** 1133: * Returns the fragment of the URI 1134: */ 1135: public String getFragment() 1136: { 1137: return fragment; 1138: } 1139: 1140: /** 1141: * <p> 1142: * Compares the URI with the given object for equality. If the 1143: * object is not a <code>URI</code>, then the method returns false. 1144: * Otherwise, the following criteria are observed: 1145: * </p> 1146: * <ul> 1147: * <li>The scheme of the URIs must either be null (undefined) in both cases, 1148: * or equal, ignorant of case.</li> 1149: * <li>The raw fragment of the URIs must either be null (undefined) in both 1150: * cases, or equal, ignorant of case.</li> 1151: * <li>Both URIs must be of the same type (opaque or hierarchial)</li> 1152: * <li><strong>For opaque URIs:</strong></li> 1153: * <ul> 1154: * <li>The raw scheme-specific parts must be equal.</li> 1155: * </ul> 1156: * <li>For hierarchical URIs:</li> 1157: * <ul> 1158: * <li>The raw paths must be equal, ignorant of case.</li> 1159: * <li>The raw queries are either both undefined or both equal, ignorant 1160: * of case.</li> 1161: * <li>The raw authority sections are either both undefined or:</li> 1162: * <li><strong>For registry-based authorities:</strong></li> 1163: * <ul><li>they are equal.</li></ul> 1164: * <li><strong>For server-based authorities:</strong></li> 1165: * <ul> 1166: * <li>the hosts are equal, ignoring case</li> 1167: * <li>the ports are equal</li> 1168: * <li>the user information components are equal</li> 1169: * </ul> 1170: * </ul> 1171: * </ul> 1172: * 1173: * @param obj the obj to compare the URI with. 1174: * @return <code>true</code> if the objects are equal, according to 1175: * the specification above. 1176: */ 1177: public boolean equals(Object obj) 1178: { 1179: if (!(obj instanceof URI)) 1180: return false; 1181: URI uriObj = (URI) obj; 1182: if (scheme == null) 1183: { 1184: if (uriObj.getScheme() != null) 1185: return false; 1186: } 1187: else 1188: if (!(scheme.equalsIgnoreCase(uriObj.getScheme()))) 1189: return false; 1190: if (rawFragment == null) 1191: { 1192: if (uriObj.getRawFragment() != null) 1193: return false; 1194: } 1195: else 1196: if (!(rawFragment.equalsIgnoreCase(uriObj.getRawFragment()))) 1197: return false; 1198: boolean opaqueThis = isOpaque(); 1199: boolean opaqueObj = uriObj.isOpaque(); 1200: if (opaqueThis && opaqueObj) 1201: return rawSchemeSpecificPart.equals(uriObj.getRawSchemeSpecificPart()); 1202: else if (!opaqueThis && !opaqueObj) 1203: { 1204: boolean common = rawPath.equalsIgnoreCase(uriObj.getRawPath()) 1205: && ((rawQuery == null && uriObj.getRawQuery() == null) 1206: || rawQuery.equalsIgnoreCase(uriObj.getRawQuery())); 1207: if (rawAuthority == null && uriObj.getRawAuthority() == null) 1208: return common; 1209: if (host == null) 1210: return common 1211: && rawAuthority.equalsIgnoreCase(uriObj.getRawAuthority()); 1212: return common 1213: && host.equalsIgnoreCase(uriObj.getHost()) 1214: && port == uriObj.getPort() 1215: && (rawUserInfo == null ? 1216: uriObj.getRawUserInfo() == null : 1217: rawUserInfo.equalsIgnoreCase(uriObj.getRawUserInfo())); 1218: } 1219: else 1220: return false; 1221: } 1222: 1223: /** 1224: * Computes the hashcode of the URI 1225: */ 1226: public int hashCode() 1227: { 1228: return (getScheme() == null ? 0 : 13 * getScheme().hashCode()) 1229: + 17 * getRawSchemeSpecificPart().hashCode() 1230: + (getRawFragment() == null ? 0 : 21 + getRawFragment().hashCode()); 1231: } 1232: 1233: /** 1234: * Compare the URI with another object that must also be a URI. 1235: * Undefined components are taken to be less than any other component. 1236: * The following criteria are observed: 1237: * </p> 1238: * <ul> 1239: * <li>Two URIs with different schemes are compared according to their 1240: * scheme, regardless of case.</li> 1241: * <li>A hierarchical URI is less than an opaque URI with the same 1242: * scheme.</li> 1243: * <li><strong>For opaque URIs:</strong></li> 1244: * <ul> 1245: * <li>URIs with differing scheme-specific parts are ordered according 1246: * to the ordering of the scheme-specific part.</li> 1247: * <li>URIs with the same scheme-specific part are ordered by the 1248: * raw fragment.</li> 1249: * </ul> 1250: * <li>For hierarchical URIs:</li> 1251: * <ul> 1252: * <li>URIs are ordered according to their raw authority sections, 1253: * if they are unequal.</li> 1254: * <li><strong>For registry-based authorities:</strong></li> 1255: * <ul><li>they are ordered according to the ordering of the authority 1256: * component.</li></ul> 1257: * <li><strong>For server-based authorities:</strong></li> 1258: * <ul> 1259: * <li>URIs are ordered according to the raw user information.</li> 1260: * <li>URIs with the same user information are ordered by the host, 1261: * ignoring case.</li> 1262: * <lI>URIs with the same host are ordered by the port.</li> 1263: * </ul> 1264: * <li>URIs with the same authority section are ordered by the raw path.</li> 1265: * <li>URIs with the same path are ordered by their raw query.</li> 1266: * <li>URIs with the same query are ordered by their raw fragments.</li> 1267: * </ul> 1268: * </ul> 1269: * 1270: * @param obj This object to compare this URI with 1271: * @return a negative integer, zero or a positive integer depending 1272: * on whether this URI is less than, equal to or greater 1273: * than that supplied, respectively. 1274: * @throws ClassCastException if the given object is not a URI 1275: */ 1276: public int compareTo(Object obj) 1277: throws ClassCastException 1278: { 1279: URI uri = (URI) obj; 1280: if (scheme == null && uri.getScheme() != null) 1281: return -1; 1282: if (scheme != null) 1283: { 1284: int sCompare = scheme.compareToIgnoreCase(uri.getScheme()); 1285: if (sCompare != 0) 1286: return sCompare; 1287: } 1288: boolean opaqueThis = isOpaque(); 1289: boolean opaqueObj = uri.isOpaque(); 1290: if (opaqueThis && !opaqueObj) 1291: return 1; 1292: if (!opaqueThis && opaqueObj) 1293: return -1; 1294: if (opaqueThis) 1295: { 1296: int ssCompare = 1297: rawSchemeSpecificPart.compareTo(uri.getRawSchemeSpecificPart()); 1298: if (ssCompare == 0) 1299: return compareFragments(uri); 1300: else 1301: return ssCompare; 1302: } 1303: if (rawAuthority == null && uri.getRawAuthority() != null) 1304: return -1; 1305: if (rawAuthority != null) 1306: { 1307: int aCompare = rawAuthority.compareTo(uri.getRawAuthority()); 1308: if (aCompare != 0) 1309: { 1310: if (host == null) 1311: return aCompare; 1312: if (rawUserInfo == null && uri.getRawUserInfo() != null) 1313: return -1; 1314: int uCompare = rawUserInfo.compareTo(uri.getRawUserInfo()); 1315: if (uCompare != 0) 1316: return uCompare; 1317: if (host == null && uri.getHost() != null) 1318: return -1; 1319: int hCompare = host.compareTo(uri.getHost()); 1320: if (hCompare != 0) 1321: return hCompare; 1322: return new Integer(port).compareTo(new Integer(uri.getPort())); 1323: } 1324: } 1325: if (rawPath == null && uri.getRawPath() != null) 1326: return -1; 1327: if (rawPath != null) 1328: { 1329: int pCompare = rawPath.compareTo(uri.getRawPath()); 1330: if (pCompare != 0) 1331: return pCompare; 1332: } 1333: if (rawQuery == null && uri.getRawQuery() != null) 1334: return -1; 1335: if (rawQuery != null) 1336: { 1337: int qCompare = rawQuery.compareTo(uri.getRawQuery()); 1338: if (qCompare != 0) 1339: return qCompare; 1340: } 1341: return compareFragments(uri); 1342: } 1343: 1344: /** 1345: * Compares the fragment of this URI with that of the supplied URI. 1346: * 1347: * @param uri the URI to compare with this one. 1348: * @return a negative integer, zero or a positive integer depending 1349: * on whether this uri's fragment is less than, equal to 1350: * or greater than the fragment of the uri supplied, respectively. 1351: */ 1352: private int compareFragments(URI uri) 1353: { 1354: if (rawFragment == null && uri.getRawFragment() != null) 1355: return -1; 1356: else if (rawFragment == null) 1357: return 0; 1358: else 1359: return rawFragment.compareTo(uri.getRawFragment()); 1360: } 1361: 1362: /** 1363: * Returns the URI as a String. If the URI was created using a constructor, 1364: * then this will be the same as the original input string. 1365: * 1366: * @return a string representation of the URI. 1367: */ 1368: public String toString() 1369: { 1370: return (scheme == null ? "" : scheme + ":") 1371: + rawSchemeSpecificPart 1372: + (rawFragment == null ? "" : "#" + rawFragment); 1373: } 1374: 1375: /** 1376: * Returns the URI as US-ASCII string. This is the same as the result 1377: * from <code>toString()</code> for URIs that don't contain any non-US-ASCII 1378: * characters. Otherwise, the non-US-ASCII characters are replaced 1379: * by their percent-encoded representations. 1380: * 1381: * @return a string representation of the URI, containing only US-ASCII 1382: * characters. 1383: */ 1384: public String toASCIIString() 1385: { 1386: String strRep = toString(); 1387: boolean inNonAsciiBlock = false; 1388: StringBuffer buffer = new StringBuffer(); 1389: StringBuffer encBuffer = null; 1390: for (int i = 0; i < strRep.length(); i++) 1391: { 1392: char c = strRep.charAt(i); 1393: if (c <= 127) 1394: { 1395: if (inNonAsciiBlock) 1396: { 1397: buffer.append(escapeCharacters(encBuffer.toString())); 1398: inNonAsciiBlock = false; 1399: } 1400: buffer.append(c); 1401: } 1402: else 1403: { 1404: if (!inNonAsciiBlock) 1405: { 1406: encBuffer = new StringBuffer(); 1407: inNonAsciiBlock = true; 1408: } 1409: encBuffer.append(c); 1410: } 1411: } 1412: return buffer.toString(); 1413: } 1414: 1415: /** 1416: * Converts the non-ASCII characters in the supplied string 1417: * to their equivalent percent-encoded representations. 1418: * That is, they are replaced by "%" followed by their hexadecimal value. 1419: * 1420: * @param str a string including non-ASCII characters. 1421: * @return the string with the non-ASCII characters converted to their 1422: * percent-encoded representations. 1423: */ 1424: private static String escapeCharacters(String str) 1425: { 1426: try 1427: { 1428: StringBuffer sb = new StringBuffer(); 1429: // this is far from optimal, but it works 1430: byte[] utf8 = str.getBytes("utf-8"); 1431: for (int j = 0; j < utf8.length; j++) 1432: { 1433: sb.append('%'); 1434: sb.append(HEX.charAt((utf8[j] & 0xff) / 16)); 1435: sb.append(HEX.charAt((utf8[j] & 0xff) % 16)); 1436: } 1437: return sb.toString(); 1438: } 1439: catch (java.io.UnsupportedEncodingException x) 1440: { 1441: throw (Error) new InternalError("Escaping error").initCause(x); 1442: } 1443: } 1444: 1445: }