http://xml.apache.org/http://www.apache.org/http://www.w3.org/

Home

Readme
Download
Installation
Build

API Docs
Samples
Schema

FAQs
Programming
Migration

Releases
Bug-Reporting
Feedback

Y2K Compliance
PDF Document

CVS Repository
Mail Archive

API Docs for SAX and DOM
 

Main Page   Class Hierarchy   Alphabetical List   Compound List   File List   Compound Members   File Members  

XMLString.hpp

Go to the documentation of this file.
00001 /*
00002  * The Apache Software License, Version 1.1
00003  *
00004  * Copyright (c) 1999-2001 The Apache Software Foundation.  All rights
00005  * reserved.
00006  *
00007  * Redistribution and use in source and binary forms, with or without
00008  * modification, are permitted provided that the following conditions
00009  * are met:
00010  *
00011  * 1. Redistributions of source code must retain the above copyright
00012  *    notice, this list of conditions and the following disclaimer.
00013  *
00014  * 2. Redistributions in binary form must reproduce the above copyright
00015  *    notice, this list of conditions and the following disclaimer in
00016  *    the documentation and/or other materials provided with the
00017  *    distribution.
00018  *
00019  * 3. The end-user documentation included with the redistribution,
00020  *    if any, must include the following acknowledgment:
00021  *       "This product includes software developed by the
00022  *        Apache Software Foundation (http://www.apache.org/)."
00023  *    Alternately, this acknowledgment may appear in the software itself,
00024  *    if and wherever such third-party acknowledgments normally appear.
00025  *
00026  * 4. The names "Xerces" and "Apache Software Foundation" must
00027  *    not be used to endorse or promote products derived from this
00028  *    software without prior written permission. For written
00029  *    permission, please contact apache\@apache.org.
00030  *
00031  * 5. Products derived from this software may not be called "Apache",
00032  *    nor may "Apache" appear in their name, without prior written
00033  *    permission of the Apache Software Foundation.
00034  *
00035  * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
00036  * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
00037  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
00038  * DISCLAIMED.  IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
00039  * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
00040  * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
00041  * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
00042  * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
00043  * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
00044  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
00045  * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
00046  * SUCH DAMAGE.
00047  * ====================================================================
00048  *
00049  * This software consists of voluntary contributions made by many
00050  * individuals on behalf of the Apache Software Foundation, and was
00051  * originally based on software copyright (c) 1999, International
00052  * Business Machines, Inc., http://www.ibm.com .  For more information
00053  * on the Apache Software Foundation, please see
00054  * <http://www.apache.org/>.
00055  */
00056 
00057 /*
00058  * $Log: XMLString.hpp,v $
00059  * Revision 1.26  2001/08/10 16:23:06  peiyongz
00060  * isHex(), isAlphaNum(), isAllWhiteSpace() and patternMatch() Added
00061  *
00062  * Revision 1.25  2001/07/06 20:27:57  peiyongz
00063  * isValidaQName()
00064  *
00065  * Revision 1.24  2001/07/04 14:38:20  peiyongz
00066  * IDDatatypeValidator: created
00067  * DatatypeValidatorFactory: IDDTV enabled
00068  * XMLString:isValidName(): to validate Name (XML [4][5])
00069  *
00070  * Revision 1.23  2001/06/13 14:07:55  peiyongz
00071  * isValidaEncName() to validate an encoding name (EncName)
00072  *
00073  * Revision 1.22  2001/05/23 15:44:51  tng
00074  * Schema: NormalizedString fix.  By Pei Yong Zhang.
00075  *
00076  * Revision 1.21  2001/05/11 13:26:31  tng
00077  * Copyright update.
00078  *
00079  * Revision 1.20  2001/05/09 18:43:30  tng
00080  * Add StringDatatypeValidator and BooleanDatatypeValidator.  By Pei Yong Zhang.
00081  *
00082  * Revision 1.19  2001/05/03 20:34:35  tng
00083  * Schema: SchemaValidator update
00084  *
00085  * Revision 1.18  2001/05/03 19:17:35  knoaman
00086  * TraverseSchema Part II.
00087  *
00088  * Revision 1.17  2001/03/21 21:56:13  tng
00089  * Schema: Add Schema Grammar, Schema Validator, and split the DTDValidator into DTDValidator, DTDScanner, and DTDGrammar.
00090  *
00091  * Revision 1.16  2001/03/02 20:52:46  knoaman
00092  * Schema: Regular expression - misc. updates for error messages,
00093  * and additions of new functions to XMLString class.
00094  *
00095  * Revision 1.15  2001/01/15 21:26:34  tng
00096  * Performance Patches by David Bertoni.
00097  *
00098  * Details: (see xerces-c-dev mailing Jan 14)
00099  * XMLRecognizer.cpp: the internal encoding string XMLUni::fgXMLChEncodingString
00100  * was going through this function numerous times.  As a result, the top hot-spot
00101  * for the parse was _wcsicmp().  The real problem is that the Microsofts wide string
00102  * functions are unbelievably slow.  For things like encodings, it might be
00103  * better to use a special comparison function that only considers a-z and
00104  * A-Z as characters with case.  This works since the character set for
00105  * encodings is limit to printable ASCII characters.
00106  *
00107  *  XMLScanner2.cpp: This also has some case-sensitive vs. insensitive compares.
00108  * They are also much faster.  The other tweak is to only make a copy of an attribute
00109  * string if it needs to be split.  And then, the strategy is to try to use a
00110  * stack-based buffer, rather than a dynamically-allocated one.
00111  *
00112  * SAX2XMLReaderImpl.cpp: Again, more case-sensitive vs. insensitive comparisons.
00113  *
00114  * KVStringPair.cpp & hpp: By storing the size of the allocation, the storage can
00115  * likely be re-used many times, cutting down on dynamic memory allocations.
00116  *
00117  * XMLString.hpp: a more efficient implementation of stringLen().
00118  *
00119  * DTDValidator.cpp: another case of using a stack-based buffer when possible
00120  *
00121  * These patches made a big difference in parse time in some of our test
00122  * files, especially the ones are very attribute-heavy.
00123  *
00124  * Revision 1.14  2000/10/13 22:47:57  andyh
00125  * Fix bug (failure to null-terminate result) in XMLString::trim().
00126  * Patch contributed by Nadav Aharoni
00127  *
00128  * Revision 1.13  2000/04/12 18:42:15  roddey
00129  * Improved docs in terms of what 'max chars' means in the method
00130  * parameters.
00131  *
00132  * Revision 1.12  2000/04/06 19:42:51  rahulj
00133  * Clarified how big the target buffer should be in the API
00134  * documentation.
00135  *
00136  * Revision 1.11  2000/03/23 01:02:38  roddey
00137  * Updates to the XMLURL class to correct a lot of parsing problems
00138  * and to add support for the port number. Updated the URL tests
00139  * to test some of this new stuff.
00140  *
00141  * Revision 1.10  2000/03/20 23:00:46  rahulj
00142  * Moved the inline definition of stringLen before the first
00143  * use. This satisfied the HP CC compiler.
00144  *
00145  * Revision 1.9  2000/03/02 19:54:49  roddey
00146  * This checkin includes many changes done while waiting for the
00147  * 1.1.0 code to be finished. I can't list them all here, but a list is
00148  * available elsewhere.
00149  *
00150  * Revision 1.8  2000/02/24 20:05:26  abagchi
00151  * Swat for removing Log from API docs
00152  *
00153  * Revision 1.7  2000/02/16 18:51:52  roddey
00154  * Fixed some facts in the docs and reformatted the docs to stay within
00155  * a reasonable line width.
00156  *
00157  * Revision 1.6  2000/02/16 17:07:07  abagchi
00158  * Added API docs
00159  *
00160  * Revision 1.5  2000/02/06 07:48:06  rahulj
00161  * Year 2K copyright swat.
00162  *
00163  * Revision 1.4  2000/01/12 00:16:23  roddey
00164  * Changes to deal with multiply nested, relative pathed, entities and to deal
00165  * with the new URL class changes.
00166  *
00167  * Revision 1.3  1999/12/18 00:18:10  roddey
00168  * More changes to support the new, completely orthagonal support for
00169  * intrinsic encodings.
00170  *
00171  * Revision 1.2  1999/12/15 19:41:28  roddey
00172  * Support for the new transcoder system, where even intrinsic encodings are
00173  * done via the same transcoder abstraction as external ones.
00174  *
00175  * Revision 1.1.1.1  1999/11/09 01:05:52  twl
00176  * Initial checkin
00177  *
00178  * Revision 1.2  1999/11/08 20:45:21  rahul
00179  * Swat for adding in Product name and CVS comment log variable.
00180  *
00181  */
00182 
00183 #if !defined(XMLSTRING_HPP)
00184 #define XMLSTRING_HPP
00185 
00186 #include <util/XercesDefs.hpp>
00187 #include <util/RefVectorOf.hpp>
00188 
00189 class XMLLCPTranscoder;
00190 
00202 class  XMLString
00203 {
00204 public:
00205     /* Static methods for native character mode string manipulation */
00208 
00219     static void binToText
00220     (
00221         const   unsigned int    toFormat
00222         ,       char* const     toFill
00223         , const unsigned int    maxChars
00224         , const unsigned int    radix
00225     );
00226 
00237     static void binToText
00238     (
00239         const   unsigned int    toFormat
00240         ,       XMLCh* const    toFill
00241         , const unsigned int    maxChars
00242         , const unsigned int    radix
00243     );
00244 
00255     static void binToText
00256     (
00257         const   unsigned long   toFormat
00258         ,       char* const     toFill
00259         , const unsigned int    maxChars
00260         , const unsigned int    radix
00261     );
00262 
00273     static void binToText
00274     (
00275         const   unsigned long   toFormat
00276         ,       XMLCh* const    toFill
00277         , const unsigned int    maxChars
00278         , const unsigned int    radix
00279     );
00280 
00291     static void binToText
00292     (
00293         const   long            toFormat
00294         ,       char* const     toFill
00295         , const unsigned int    maxChars
00296         , const unsigned int    radix
00297     );
00298 
00309     static void binToText
00310     (
00311         const   long            toFormat
00312         ,       XMLCh* const    toFill
00313         , const unsigned int    maxChars
00314         , const unsigned int    radix
00315     );
00316 
00327     static void binToText
00328     (
00329         const   int             toFormat
00330         ,       char* const     toFill
00331         , const unsigned int    maxChars
00332         , const unsigned int    radix
00333     );
00334 
00345     static void binToText
00346     (
00347         const   int             toFormat
00348         ,       XMLCh* const    toFill
00349         , const unsigned int    maxChars
00350         , const unsigned int    radix
00351     );
00352 
00363     static bool textToBin
00364     (
00365         const   XMLCh* const    toConvert
00366         ,       unsigned int&   toFill
00367     );
00368 
00381     static int parseInt
00382     (
00383         const   XMLCh* const    toConvert
00384     );
00385 
00387 
00390 
00404     static void catString
00405     (
00406                 char* const     target
00407         , const char* const     src
00408     );
00409 
00422     static void catString
00423     (
00424                 XMLCh* const    target
00425         , const XMLCh* const    src
00426     );
00428 
00431 
00442     static int compareIString
00443     (
00444         const   char* const     str1
00445         , const char* const     str2
00446     );
00447 
00458     static int compareIString
00459     (
00460         const   XMLCh* const    str1
00461         , const XMLCh* const    str2
00462     );
00463 
00464 
00478     static int compareNString
00479     (
00480         const   char* const     str1
00481         , const char* const     str2
00482         , const unsigned int    count
00483     );
00484 
00498     static int compareNString
00499     (
00500         const   XMLCh* const    str1
00501         , const XMLCh* const    str2
00502         , const unsigned int    count
00503     );
00504 
00505 
00519     static int compareNIString
00520     (
00521         const   char* const     str1
00522         , const char* const     str2
00523         , const unsigned int    count
00524     );
00525 
00540     static int compareNIString
00541     (
00542         const   XMLCh* const    str1
00543         , const XMLCh* const    str2
00544         , const unsigned int    count
00545     );
00546 
00559     static int compareString
00560     (
00561         const   char* const     str1
00562         , const char* const     str2
00563     );
00564 
00576     static int compareString
00577     (
00578         const   XMLCh* const    str1
00579         , const XMLCh* const    str2
00580     );
00581 
00608     static bool regionMatches
00609     (
00610         const   XMLCh* const    str1
00611         , const int             offset1
00612         , const XMLCh* const    str2
00613         , const int             offset2
00614         , const unsigned int    charCount
00615     );
00616 
00644     static bool regionIMatches
00645     (
00646         const   XMLCh* const    str1
00647         , const int             offset1
00648         , const XMLCh* const    str2
00649         , const int             offset2
00650         , const unsigned int    charCount
00651     );
00653 
00656 
00666     static void copyString
00667     (
00668                 char* const     target
00669         , const char* const     src
00670     );
00671 
00682     static void copyString
00683     (
00684                 XMLCh* const    target
00685         , const XMLCh* const    src
00686     );
00687 
00700     static bool copyNString
00701     (
00702                 XMLCh* const    target
00703         , const XMLCh* const    src
00704         , const unsigned int    maxChars
00705     );
00707 
00710 
00716     static unsigned int hash
00717     (
00718         const   char* const     tohash
00719         , const unsigned int    hashModulus
00720     );
00721 
00728     static unsigned int hash
00729     (
00730         const   XMLCh* const    toHash
00731         , const unsigned int    hashModulus
00732     );
00733 
00743     static unsigned int hashN
00744     (
00745         const   XMLCh* const    toHash
00746         , const unsigned int    numChars
00747         , const unsigned int    hashModulus
00748     );
00749 
00751 
00754 
00762     static int indexOf(const char* const toSearch, const char ch);
00763 
00772     static int indexOf(const XMLCh* const toSearch, const XMLCh ch);
00773 
00784     static int indexOf
00785     (
00786         const   char* const     toSearch
00787         , const char            chToFind
00788         , const unsigned int    fromIndex
00789     );
00790 
00801     static int indexOf
00802     (
00803         const   XMLCh* const    toSearch
00804         , const XMLCh           chToFind
00805         , const unsigned int    fromIndex
00806     );
00807 
00816     static int lastIndexOf(const char* const toSearch, const char ch);
00817 
00826     static int lastIndexOf(const XMLCh* const toSearch, const XMLCh ch);
00827 
00838     static int lastIndexOf
00839     (
00840         const   char* const     toSearch
00841         , const char            chToFind
00842         , const unsigned int    fromIndex
00843     );
00844 
00855     static int lastIndexOf
00856     (
00857         const   XMLCh* const    toSearch
00858         , const XMLCh           ch
00859         , const unsigned int    fromIndex
00860     );
00862 
00865 
00870     static void moveChars
00871     (
00872                 XMLCh* const    targetStr
00873         , const XMLCh* const    srcStr
00874         , const unsigned int    count
00875     );
00876 
00878 
00881 
00889     static void subString
00890     (
00891                 char* const    targetStr
00892         , const char* const    srcStr
00893         , const int            startIndex
00894         , const int            endIndex
00895     );
00896 
00905     static void subString
00906     (
00907                 XMLCh* const    targetStr
00908         , const XMLCh* const    srcStr
00909         , const int             startIndex
00910         , const int             endIndex
00911     );
00912 
00914 
00917 
00921     static char* replicate(const char* const toRep);
00922 
00927     static XMLCh* replicate(const XMLCh* const toRep);
00928 
00930 
00933 
00939     static bool startsWith
00940     (
00941         const   char* const     toTest
00942         , const char* const     prefix
00943     );
00944 
00951     static bool startsWith
00952     (
00953         const   XMLCh* const    toTest
00954         , const XMLCh* const    prefix
00955     );
00956 
00965     static bool startsWithI
00966     (
00967         const   char* const     toTest
00968         , const char* const     prefix
00969     );
00970 
00980     static bool startsWithI
00981     (
00982         const   XMLCh* const    toTest
00983         , const XMLCh* const    prefix
00984     );
00985 
00992     static bool endsWith
00993     (
00994         const   XMLCh* const    toTest
00995         , const XMLCh* const    prefix
00996     );
00997 
00998 
01005     static const XMLCh* findAny
01006     (
01007         const   XMLCh* const    toSearch
01008         , const XMLCh* const    searchList
01009     );
01010 
01017     static XMLCh* findAny
01018     (
01019                 XMLCh* const    toSearch
01020         , const XMLCh* const    searchList
01021     );
01022 
01029     static int patternMatch
01030     (
01031                 XMLCh* const    toSearch
01032         , const XMLCh* const    pattern
01033     );
01034 
01039     static unsigned int stringLen(const char* const src);
01040 
01045     static unsigned int stringLen(const XMLCh* const src);
01046 
01052     static bool isValidNCName(const XMLCh* const name);
01053 
01059     static bool isValidName(const XMLCh* const name);
01060 
01066     static bool isValidEncName(const XMLCh* const name);  
01067 
01073     static bool isValidQName(const XMLCh* const name);  
01074 
01080 
01081     static bool isAlpha(XMLCh const theChar);
01082 
01088     static bool isDigit(XMLCh const theChar);
01089 
01095     static bool isAlphaNum(XMLCh const theChar);
01096 
01102     static bool isHex(XMLCh const theChar);
01103 
01109     static bool isAllWhiteSpace(const XMLCh* const toCheck);
01110 
01112 
01115 
01121     static void cut
01122     (
01123                 XMLCh* const    toCutFrom
01124         , const unsigned int    count
01125     );
01126 
01135     static char* transcode
01136     (
01137         const   XMLCh* const    toTranscode
01138     );
01139 
01156     static bool transcode
01157     (
01158         const   XMLCh* const    toTranscode
01159         ,       char* const     toFill
01160         , const unsigned int    maxChars
01161     );
01162 
01171     static XMLCh* transcode
01172     (
01173         const   char* const     toTranscode
01174     );
01175 
01187     static bool transcode
01188     (
01189         const   char* const     toTranscode
01190         ,       XMLCh* const    toFill
01191         , const unsigned int    maxChars
01192     );
01193 
01199     static void trim(char* const toTrim);
01200 
01206     static void trim(XMLCh* const toTrim);
01207 
01214     static RefVectorOf<XMLCh>* tokenizeString(const XMLCh* const tokenizeSrc);
01215 
01221     static bool isInList(const XMLCh* const toFind, const XMLCh* const enumList);
01222 
01224 
01227 
01235     static XMLCh* makeUName
01236     (
01237         const   XMLCh* const    pszURI
01238         , const XMLCh* const    pszName
01239     );
01240 
01256     static unsigned int replaceTokens
01257     (
01258                 XMLCh* const    errText
01259         , const unsigned int    maxChars
01260         , const XMLCh* const    text1
01261         , const XMLCh* const    text2
01262         , const XMLCh* const    text3
01263         , const XMLCh* const    text4
01264     );
01265 
01270     static void upperCase(XMLCh* const toUpperCase);
01271 
01276     static void lowerCase(XMLCh* const toLowerCase);
01277 
01281     static bool isWSReplaced(const XMLCh* const toCheck);
01282 
01286     static bool isWSCollapsed(const XMLCh* const toCheck);
01287 
01292     static void replaceWS(XMLCh* const toConvert);
01293        
01298     static void collapseWS(XMLCh* const toConvert);
01300 
01301 
01302 private :
01303 
01306 
01307     XMLString();
01309     ~XMLString();
01311 
01312 
01315 
01316     static void initString(XMLLCPTranscoder* const defToUse);
01317     static void termString();
01319 
01324     static bool validateRegion(const XMLCh* const str1, const int offset1,
01325                         const XMLCh* const str2, const int offset2,
01326                         const unsigned int charsCount);
01327 
01328     friend class XMLPlatformUtils;
01329 };
01330 
01331 
01332 // ---------------------------------------------------------------------------
01333 //  Inline some methods that are either just passthroughs to other string
01334 //  methods, or which are key for performance.
01335 // ---------------------------------------------------------------------------
01336 inline void XMLString::moveChars(       XMLCh* const    targetStr
01337                                 , const XMLCh* const    srcStr
01338                                 , const unsigned int    count)
01339 {
01340     XMLCh* outPtr = targetStr;
01341     const XMLCh* inPtr = srcStr;
01342     for (unsigned int index = 0; index < count; index++)
01343         *outPtr++ = *inPtr++;
01344 }
01345 
01346 inline unsigned int XMLString::stringLen(const XMLCh* const src)
01347 {
01348     if (src == 0 || *src == 0)
01349     {
01350         return 0;
01351    }
01352     else
01353    {
01354         const XMLCh* pszTmp = src + 1;
01355 
01356         while (*pszTmp)
01357             ++pszTmp;
01358 
01359         return (unsigned int)(pszTmp - src);
01360     }
01361 }
01362 
01363 inline bool XMLString::startsWith(  const   XMLCh* const    toTest
01364                                     , const XMLCh* const    prefix)
01365 {
01366     return (compareNString(toTest, prefix, stringLen(prefix)) == 0);
01367 }
01368 
01369 inline bool XMLString::startsWithI( const   XMLCh* const    toTest
01370                                     , const XMLCh* const    prefix)
01371 {
01372     return (compareNIString(toTest, prefix, stringLen(prefix)) == 0);
01373 }
01374 
01375 inline bool XMLString::endsWith(const XMLCh* const toTest,
01376                                 const XMLCh* const suffix)
01377 {
01378 
01379     unsigned int suffixLen = XMLString::stringLen(suffix);
01380 
01381     return regionMatches(toTest, XMLString::stringLen(toTest) - suffixLen,
01382                          suffix, 0, suffixLen);
01383 }
01384 
01385 inline XMLCh* XMLString::replicate(const XMLCh* const toRep)
01386 {
01387     // If a null string, return a null string!
01388     XMLCh* ret = 0;
01389     if (toRep)
01390     {
01391         const unsigned int len = stringLen(toRep);
01392         ret = new XMLCh[len + 1];
01393         XMLCh* outPtr = ret;
01394         const XMLCh* inPtr = toRep;
01395         for (unsigned int index = 0; index <= len; index++)
01396             *outPtr++ = *inPtr++;
01397     }
01398     return ret;
01399 }
01400 
01401 inline bool XMLString::validateRegion(const XMLCh* const str1,
01402                                       const int offset1,
01403                                       const XMLCh* const str2,
01404                                       const int offset2,
01405                                       const unsigned int charsCount)
01406 {
01407 
01408     if (offset1 < 0 || offset2 < 0 ||
01409         (offset1 + charsCount) > XMLString::stringLen(str1) ||
01410         (offset2 + charsCount) > XMLString::stringLen(str2) )
01411         return false;
01412 
01413     return true;
01414 }
01415 
01416 #endif


Copyright © 2000 The Apache Software Foundation. All Rights Reserved.