http://xml.apache.org/http://www.apache.org/http://www.w3.org/

Home

Readme
Download
Installation
Build

API Docs
Samples
Schema

FAQs
Programming
Migration

Releases
Bug-Reporting
Feedback

Y2K Compliance
PDF Document

CVS Repository
Mail Archive

API Docs for SAX and DOM
 

Main Page   Class Hierarchy   Alphabetical List   Compound List   File List   Compound Members   File Members  

XMLString.hpp

Go to the documentation of this file.
00001 /*
00002  * The Apache Software License, Version 1.1
00003  *
00004  * Copyright (c) 1999-2001 The Apache Software Foundation.  All rights
00005  * reserved.
00006  *
00007  * Redistribution and use in source and binary forms, with or without
00008  * modification, are permitted provided that the following conditions
00009  * are met:
00010  *
00011  * 1. Redistributions of source code must retain the above copyright
00012  *    notice, this list of conditions and the following disclaimer.
00013  *
00014  * 2. Redistributions in binary form must reproduce the above copyright
00015  *    notice, this list of conditions and the following disclaimer in
00016  *    the documentation and/or other materials provided with the
00017  *    distribution.
00018  *
00019  * 3. The end-user documentation included with the redistribution,
00020  *    if any, must include the following acknowledgment:
00021  *       "This product includes software developed by the
00022  *        Apache Software Foundation (http://www.apache.org/)."
00023  *    Alternately, this acknowledgment may appear in the software itself,
00024  *    if and wherever such third-party acknowledgments normally appear.
00025  *
00026  * 4. The names "Xerces" and "Apache Software Foundation" must
00027  *    not be used to endorse or promote products derived from this
00028  *    software without prior written permission. For written
00029  *    permission, please contact apache\@apache.org.
00030  *
00031  * 5. Products derived from this software may not be called "Apache",
00032  *    nor may "Apache" appear in their name, without prior written
00033  *    permission of the Apache Software Foundation.
00034  *
00035  * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
00036  * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
00037  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
00038  * DISCLAIMED.  IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
00039  * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
00040  * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
00041  * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
00042  * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
00043  * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
00044  * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
00045  * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
00046  * SUCH DAMAGE.
00047  * ====================================================================
00048  *
00049  * This software consists of voluntary contributions made by many
00050  * individuals on behalf of the Apache Software Foundation, and was
00051  * originally based on software copyright (c) 1999, International
00052  * Business Machines, Inc., http://www.ibm.com .  For more information
00053  * on the Apache Software Foundation, please see
00054  * <http://www.apache.org/>.
00055  */
00056 
00057 /*
00058  * $Log: XMLString.hpp,v $
00059  * Revision 1.25  2001/07/06 20:27:57  peiyongz
00060  * isValidaQName()
00061  *
00062  * Revision 1.24  2001/07/04 14:38:20  peiyongz
00063  * IDDatatypeValidator: created
00064  * DatatypeValidatorFactory: IDDTV enabled
00065  * XMLString:isValidName(): to validate Name (XML [4][5])
00066  *
00067  * Revision 1.23  2001/06/13 14:07:55  peiyongz
00068  * isValidaEncName() to validate an encoding name (EncName)
00069  *
00070  * Revision 1.22  2001/05/23 15:44:51  tng
00071  * Schema: NormalizedString fix.  By Pei Yong Zhang.
00072  *
00073  * Revision 1.21  2001/05/11 13:26:31  tng
00074  * Copyright update.
00075  *
00076  * Revision 1.20  2001/05/09 18:43:30  tng
00077  * Add StringDatatypeValidator and BooleanDatatypeValidator.  By Pei Yong Zhang.
00078  *
00079  * Revision 1.19  2001/05/03 20:34:35  tng
00080  * Schema: SchemaValidator update
00081  *
00082  * Revision 1.18  2001/05/03 19:17:35  knoaman
00083  * TraverseSchema Part II.
00084  *
00085  * Revision 1.17  2001/03/21 21:56:13  tng
00086  * Schema: Add Schema Grammar, Schema Validator, and split the DTDValidator into DTDValidator, DTDScanner, and DTDGrammar.
00087  *
00088  * Revision 1.16  2001/03/02 20:52:46  knoaman
00089  * Schema: Regular expression - misc. updates for error messages,
00090  * and additions of new functions to XMLString class.
00091  *
00092  * Revision 1.15  2001/01/15 21:26:34  tng
00093  * Performance Patches by David Bertoni.
00094  *
00095  * Details: (see xerces-c-dev mailing Jan 14)
00096  * XMLRecognizer.cpp: the internal encoding string XMLUni::fgXMLChEncodingString
00097  * was going through this function numerous times.  As a result, the top hot-spot
00098  * for the parse was _wcsicmp().  The real problem is that the Microsofts wide string
00099  * functions are unbelievably slow.  For things like encodings, it might be
00100  * better to use a special comparison function that only considers a-z and
00101  * A-Z as characters with case.  This works since the character set for
00102  * encodings is limit to printable ASCII characters.
00103  *
00104  *  XMLScanner2.cpp: This also has some case-sensitive vs. insensitive compares.
00105  * They are also much faster.  The other tweak is to only make a copy of an attribute
00106  * string if it needs to be split.  And then, the strategy is to try to use a
00107  * stack-based buffer, rather than a dynamically-allocated one.
00108  *
00109  * SAX2XMLReaderImpl.cpp: Again, more case-sensitive vs. insensitive comparisons.
00110  *
00111  * KVStringPair.cpp & hpp: By storing the size of the allocation, the storage can
00112  * likely be re-used many times, cutting down on dynamic memory allocations.
00113  *
00114  * XMLString.hpp: a more efficient implementation of stringLen().
00115  *
00116  * DTDValidator.cpp: another case of using a stack-based buffer when possible
00117  *
00118  * These patches made a big difference in parse time in some of our test
00119  * files, especially the ones are very attribute-heavy.
00120  *
00121  * Revision 1.14  2000/10/13 22:47:57  andyh
00122  * Fix bug (failure to null-terminate result) in XMLString::trim().
00123  * Patch contributed by Nadav Aharoni
00124  *
00125  * Revision 1.13  2000/04/12 18:42:15  roddey
00126  * Improved docs in terms of what 'max chars' means in the method
00127  * parameters.
00128  *
00129  * Revision 1.12  2000/04/06 19:42:51  rahulj
00130  * Clarified how big the target buffer should be in the API
00131  * documentation.
00132  *
00133  * Revision 1.11  2000/03/23 01:02:38  roddey
00134  * Updates to the XMLURL class to correct a lot of parsing problems
00135  * and to add support for the port number. Updated the URL tests
00136  * to test some of this new stuff.
00137  *
00138  * Revision 1.10  2000/03/20 23:00:46  rahulj
00139  * Moved the inline definition of stringLen before the first
00140  * use. This satisfied the HP CC compiler.
00141  *
00142  * Revision 1.9  2000/03/02 19:54:49  roddey
00143  * This checkin includes many changes done while waiting for the
00144  * 1.1.0 code to be finished. I can't list them all here, but a list is
00145  * available elsewhere.
00146  *
00147  * Revision 1.8  2000/02/24 20:05:26  abagchi
00148  * Swat for removing Log from API docs
00149  *
00150  * Revision 1.7  2000/02/16 18:51:52  roddey
00151  * Fixed some facts in the docs and reformatted the docs to stay within
00152  * a reasonable line width.
00153  *
00154  * Revision 1.6  2000/02/16 17:07:07  abagchi
00155  * Added API docs
00156  *
00157  * Revision 1.5  2000/02/06 07:48:06  rahulj
00158  * Year 2K copyright swat.
00159  *
00160  * Revision 1.4  2000/01/12 00:16:23  roddey
00161  * Changes to deal with multiply nested, relative pathed, entities and to deal
00162  * with the new URL class changes.
00163  *
00164  * Revision 1.3  1999/12/18 00:18:10  roddey
00165  * More changes to support the new, completely orthagonal support for
00166  * intrinsic encodings.
00167  *
00168  * Revision 1.2  1999/12/15 19:41:28  roddey
00169  * Support for the new transcoder system, where even intrinsic encodings are
00170  * done via the same transcoder abstraction as external ones.
00171  *
00172  * Revision 1.1.1.1  1999/11/09 01:05:52  twl
00173  * Initial checkin
00174  *
00175  * Revision 1.2  1999/11/08 20:45:21  rahul
00176  * Swat for adding in Product name and CVS comment log variable.
00177  *
00178  */
00179 
00180 #if !defined(XMLSTRING_HPP)
00181 #define XMLSTRING_HPP
00182 
00183 #include <util/XercesDefs.hpp>
00184 #include <util/RefVectorOf.hpp>
00185 
00186 class XMLLCPTranscoder;
00187 
00199 class  XMLString
00200 {
00201 public:
00202     /* Static methods for native character mode string manipulation */
00205 
00216     static void binToText
00217     (
00218         const   unsigned int    toFormat
00219         ,       char* const     toFill
00220         , const unsigned int    maxChars
00221         , const unsigned int    radix
00222     );
00223 
00234     static void binToText
00235     (
00236         const   unsigned int    toFormat
00237         ,       XMLCh* const    toFill
00238         , const unsigned int    maxChars
00239         , const unsigned int    radix
00240     );
00241 
00252     static void binToText
00253     (
00254         const   unsigned long   toFormat
00255         ,       char* const     toFill
00256         , const unsigned int    maxChars
00257         , const unsigned int    radix
00258     );
00259 
00270     static void binToText
00271     (
00272         const   unsigned long   toFormat
00273         ,       XMLCh* const    toFill
00274         , const unsigned int    maxChars
00275         , const unsigned int    radix
00276     );
00277 
00288     static void binToText
00289     (
00290         const   long            toFormat
00291         ,       char* const     toFill
00292         , const unsigned int    maxChars
00293         , const unsigned int    radix
00294     );
00295 
00306     static void binToText
00307     (
00308         const   long            toFormat
00309         ,       XMLCh* const    toFill
00310         , const unsigned int    maxChars
00311         , const unsigned int    radix
00312     );
00313 
00324     static void binToText
00325     (
00326         const   int             toFormat
00327         ,       char* const     toFill
00328         , const unsigned int    maxChars
00329         , const unsigned int    radix
00330     );
00331 
00342     static void binToText
00343     (
00344         const   int             toFormat
00345         ,       XMLCh* const    toFill
00346         , const unsigned int    maxChars
00347         , const unsigned int    radix
00348     );
00349 
00360     static bool textToBin
00361     (
00362         const   XMLCh* const    toConvert
00363         ,       unsigned int&   toFill
00364     );
00365 
00378     static int parseInt
00379     (
00380         const   XMLCh* const    toConvert
00381     );
00382 
00384 
00387 
00401     static void catString
00402     (
00403                 char* const     target
00404         , const char* const     src
00405     );
00406 
00419     static void catString
00420     (
00421                 XMLCh* const    target
00422         , const XMLCh* const    src
00423     );
00425 
00428 
00439     static int compareIString
00440     (
00441         const   char* const     str1
00442         , const char* const     str2
00443     );
00444 
00455     static int compareIString
00456     (
00457         const   XMLCh* const    str1
00458         , const XMLCh* const    str2
00459     );
00460 
00461 
00475     static int compareNString
00476     (
00477         const   char* const     str1
00478         , const char* const     str2
00479         , const unsigned int    count
00480     );
00481 
00495     static int compareNString
00496     (
00497         const   XMLCh* const    str1
00498         , const XMLCh* const    str2
00499         , const unsigned int    count
00500     );
00501 
00502 
00516     static int compareNIString
00517     (
00518         const   char* const     str1
00519         , const char* const     str2
00520         , const unsigned int    count
00521     );
00522 
00537     static int compareNIString
00538     (
00539         const   XMLCh* const    str1
00540         , const XMLCh* const    str2
00541         , const unsigned int    count
00542     );
00543 
00556     static int compareString
00557     (
00558         const   char* const     str1
00559         , const char* const     str2
00560     );
00561 
00573     static int compareString
00574     (
00575         const   XMLCh* const    str1
00576         , const XMLCh* const    str2
00577     );
00578 
00605     static bool regionMatches
00606     (
00607         const   XMLCh* const    str1
00608         , const int             offset1
00609         , const XMLCh* const    str2
00610         , const int             offset2
00611         , const unsigned int    charCount
00612     );
00613 
00641     static bool regionIMatches
00642     (
00643         const   XMLCh* const    str1
00644         , const int             offset1
00645         , const XMLCh* const    str2
00646         , const int             offset2
00647         , const unsigned int    charCount
00648     );
00650 
00653 
00663     static void copyString
00664     (
00665                 char* const     target
00666         , const char* const     src
00667     );
00668 
00679     static void copyString
00680     (
00681                 XMLCh* const    target
00682         , const XMLCh* const    src
00683     );
00684 
00697     static bool copyNString
00698     (
00699                 XMLCh* const    target
00700         , const XMLCh* const    src
00701         , const unsigned int    maxChars
00702     );
00704 
00707 
00713     static unsigned int hash
00714     (
00715         const   char* const     tohash
00716         , const unsigned int    hashModulus
00717     );
00718 
00725     static unsigned int hash
00726     (
00727         const   XMLCh* const    toHash
00728         , const unsigned int    hashModulus
00729     );
00730 
00740     static unsigned int hashN
00741     (
00742         const   XMLCh* const    toHash
00743         , const unsigned int    numChars
00744         , const unsigned int    hashModulus
00745     );
00746 
00748 
00751 
00759     static int indexOf(const char* const toSearch, const char ch);
00760 
00769     static int indexOf(const XMLCh* const toSearch, const XMLCh ch);
00770 
00781     static int indexOf
00782     (
00783         const   char* const     toSearch
00784         , const char            chToFind
00785         , const unsigned int    fromIndex
00786     );
00787 
00798     static int indexOf
00799     (
00800         const   XMLCh* const    toSearch
00801         , const XMLCh           chToFind
00802         , const unsigned int    fromIndex
00803     );
00804 
00813     static int lastIndexOf(const char* const toSearch, const char ch);
00814 
00823     static int lastIndexOf(const XMLCh* const toSearch, const XMLCh ch);
00824 
00835     static int lastIndexOf
00836     (
00837         const   char* const     toSearch
00838         , const char            chToFind
00839         , const unsigned int    fromIndex
00840     );
00841 
00852     static int lastIndexOf
00853     (
00854         const   XMLCh* const    toSearch
00855         , const XMLCh           ch
00856         , const unsigned int    fromIndex
00857     );
00859 
00862 
00867     static void moveChars
00868     (
00869                 XMLCh* const    targetStr
00870         , const XMLCh* const    srcStr
00871         , const unsigned int    count
00872     );
00873 
00875 
00878 
00886     static void subString
00887     (
00888                 char* const    targetStr
00889         , const char* const    srcStr
00890         , const int            startIndex
00891         , const int            endIndex
00892     );
00893 
00902     static void subString
00903     (
00904                 XMLCh* const    targetStr
00905         , const XMLCh* const    srcStr
00906         , const int             startIndex
00907         , const int             endIndex
00908     );
00909 
00911 
00914 
00918     static char* replicate(const char* const toRep);
00919 
00924     static XMLCh* replicate(const XMLCh* const toRep);
00925 
00927 
00930 
00936     static bool startsWith
00937     (
00938         const   char* const     toTest
00939         , const char* const     prefix
00940     );
00941 
00948     static bool startsWith
00949     (
00950         const   XMLCh* const    toTest
00951         , const XMLCh* const    prefix
00952     );
00953 
00962     static bool startsWithI
00963     (
00964         const   char* const     toTest
00965         , const char* const     prefix
00966     );
00967 
00977     static bool startsWithI
00978     (
00979         const   XMLCh* const    toTest
00980         , const XMLCh* const    prefix
00981     );
00982 
00989     static bool endsWith
00990     (
00991         const   XMLCh* const    toTest
00992         , const XMLCh* const    prefix
00993     );
00994 
00995 
01002     static const XMLCh* findAny
01003     (
01004         const   XMLCh* const    toSearch
01005         , const XMLCh* const    searchList
01006     );
01007 
01014     static XMLCh* findAny
01015     (
01016                 XMLCh* const    toSearch
01017         , const XMLCh* const    searchList
01018     );
01019 
01024     static unsigned int stringLen(const char* const src);
01025 
01030     static unsigned int stringLen(const XMLCh* const src);
01031 
01037     static bool isValidNCName(const XMLCh* const name);
01038 
01044     static bool isValidName(const XMLCh* const name);
01045 
01051     static bool isValidEncName(const XMLCh* const name);  
01052 
01058     static bool isValidQName(const XMLCh* const name);  
01059 
01065 
01066     static bool isAlpha(XMLCh const theChar);
01067 
01073     static bool isDigit(XMLCh const theChar);
01074 
01076 
01079 
01085     static void cut
01086     (
01087                 XMLCh* const    toCutFrom
01088         , const unsigned int    count
01089     );
01090 
01099     static char* transcode
01100     (
01101         const   XMLCh* const    toTranscode
01102     );
01103 
01120     static bool transcode
01121     (
01122         const   XMLCh* const    toTranscode
01123         ,       char* const     toFill
01124         , const unsigned int    maxChars
01125     );
01126 
01135     static XMLCh* transcode
01136     (
01137         const   char* const     toTranscode
01138     );
01139 
01151     static bool transcode
01152     (
01153         const   char* const     toTranscode
01154         ,       XMLCh* const    toFill
01155         , const unsigned int    maxChars
01156     );
01157 
01163     static void trim(char* const toTrim);
01164 
01170     static void trim(XMLCh* const toTrim);
01171 
01178     static RefVectorOf<XMLCh>* tokenizeString(const XMLCh* const tokenizeSrc);
01179 
01185     static bool isInList(const XMLCh* const toFind, const XMLCh* const enumList);
01186 
01188 
01191 
01199     static XMLCh* makeUName
01200     (
01201         const   XMLCh* const    pszURI
01202         , const XMLCh* const    pszName
01203     );
01204 
01220     static unsigned int replaceTokens
01221     (
01222                 XMLCh* const    errText
01223         , const unsigned int    maxChars
01224         , const XMLCh* const    text1
01225         , const XMLCh* const    text2
01226         , const XMLCh* const    text3
01227         , const XMLCh* const    text4
01228     );
01229 
01234     static void upperCase(XMLCh* const toUpperCase);
01235 
01240     static void lowerCase(XMLCh* const toLowerCase);
01241 
01245     static bool isWSReplaced(const XMLCh* const toCheck);
01246 
01250     static bool isWSCollapsed(const XMLCh* const toCheck);
01251 
01256     static void replaceWS(XMLCh* const toConvert);
01257        
01262     static void collapseWS(XMLCh* const toConvert);
01264 
01265 
01266 private :
01267 
01270 
01271     XMLString();
01273     ~XMLString();
01275 
01276 
01279 
01280     static void initString(XMLLCPTranscoder* const defToUse);
01281     static void termString();
01283 
01288     static bool validateRegion(const XMLCh* const str1, const int offset1,
01289                         const XMLCh* const str2, const int offset2,
01290                         const unsigned int charsCount);
01291 
01292     friend class XMLPlatformUtils;
01293 };
01294 
01295 
01296 // ---------------------------------------------------------------------------
01297 //  Inline some methods that are either just passthroughs to other string
01298 //  methods, or which are key for performance.
01299 // ---------------------------------------------------------------------------
01300 inline void XMLString::moveChars(       XMLCh* const    targetStr
01301                                 , const XMLCh* const    srcStr
01302                                 , const unsigned int    count)
01303 {
01304     XMLCh* outPtr = targetStr;
01305     const XMLCh* inPtr = srcStr;
01306     for (unsigned int index = 0; index < count; index++)
01307         *outPtr++ = *inPtr++;
01308 }
01309 
01310 inline unsigned int XMLString::stringLen(const XMLCh* const src)
01311 {
01312     if (src == 0 || *src == 0)
01313     {
01314         return 0;
01315    }
01316     else
01317    {
01318         const XMLCh* pszTmp = src + 1;
01319 
01320         while (*pszTmp)
01321             ++pszTmp;
01322 
01323         return (unsigned int)(pszTmp - src);
01324     }
01325 }
01326 
01327 inline bool XMLString::startsWith(  const   XMLCh* const    toTest
01328                                     , const XMLCh* const    prefix)
01329 {
01330     return (compareNString(toTest, prefix, stringLen(prefix)) == 0);
01331 }
01332 
01333 inline bool XMLString::startsWithI( const   XMLCh* const    toTest
01334                                     , const XMLCh* const    prefix)
01335 {
01336     return (compareNIString(toTest, prefix, stringLen(prefix)) == 0);
01337 }
01338 
01339 inline bool XMLString::endsWith(const XMLCh* const toTest,
01340                                 const XMLCh* const suffix)
01341 {
01342 
01343     unsigned int suffixLen = XMLString::stringLen(suffix);
01344 
01345     return regionMatches(toTest, XMLString::stringLen(toTest) - suffixLen,
01346                          suffix, 0, suffixLen);
01347 }
01348 
01349 inline XMLCh* XMLString::replicate(const XMLCh* const toRep)
01350 {
01351     // If a null string, return a null string!
01352     XMLCh* ret = 0;
01353     if (toRep)
01354     {
01355         const unsigned int len = stringLen(toRep);
01356         ret = new XMLCh[len + 1];
01357         XMLCh* outPtr = ret;
01358         const XMLCh* inPtr = toRep;
01359         for (unsigned int index = 0; index <= len; index++)
01360             *outPtr++ = *inPtr++;
01361     }
01362     return ret;
01363 }
01364 
01365 inline bool XMLString::validateRegion(const XMLCh* const str1,
01366                                       const int offset1,
01367                                       const XMLCh* const str2,
01368                                       const int offset2,
01369                                       const unsigned int charsCount)
01370 {
01371 
01372     if (offset1 < 0 || offset2 < 0 ||
01373         (offset1 + charsCount) > XMLString::stringLen(str1) ||
01374         (offset2 + charsCount) > XMLString::stringLen(str2) )
01375         return false;
01376 
01377     return true;
01378 }
01379 
01380 #endif


Copyright © 2000 The Apache Software Foundation. All Rights Reserved.