Classes | Typedefs | Functions | Variables
ibis::util Namespace Reference

Organize the miscellaneous functions under the name util. More...

Classes

class  counter
 A simple shared counter. More...
class  guardBase
 A class hierarchy for cleaning up after durable resources. More...
class  guardImpl0
 A concrete class for cleanup jobs that take a function without any argument. More...
class  guardImpl1
 A concrete class for cleanup jobs that take a function with one argument. More...
class  guardImpl2
 A concrete class for cleanup jobs that take a function with two arguments. More...
class  guardObj0
 A class to work with class member functions with no arguments. More...
struct  heap
 A simple heap based on std::push_heap and std::pop_heap. More...
class  ioLock
 A global I/O lock. More...
class  logger
 A class for logging messages. More...
class  mutexLock
 An wrapper class for perform pthread_mutex_lock/unlock. More...
class  quietLock
 An wrapper class for perform pthread_mutex_lock/unlock. More...
class  readLock
 An wrapper class for perform pthread_rwlock_rdlock/unlock. More...
class  refHolder
 A template to hold a reference to an object. More...
class  sharedInt32
 A shared integer class. More...
class  sharedInt64
 A 64-bit shared integer class. More...
class  timer
 Print simply-formated timing information. More...
class  writeLock
 An wrapper class for perform pthread_rwlock_wrlock/unlock. More...

Typedefs

typedef const guardBaseguard
 The type to be used by client code.

Functions

void clear (ibis::array_t< ibis::bitvector * > &bv) throw ()
 Clear an array of bit vectors.
void clear (ibis::partList &pl) throw ()
 Deallocate the list of data partitions.
template<typename T >
void clearVec (std::vector< T * > &v)
 A template to clean up a vector of pointers.
void closeLogFile ()
int copy (const char *to, const char *from)
 Copy "from" to "to".
unsigned int gatherParts (ibis::partList &parts, const char *adir, const char *bdir, bool ro=false)
 Look for data partitions in the given pair of directories.
unsigned int gatherParts (ibis::partList &parts, const char *adir, bool ro=false)
 Look for data partitions in the given directory.
unsigned int gatherParts (ibis::partList &parts, const ibis::resource &res, bool ro=false)
 Reconstruct partitions using data directories specified in the resource list.
off_t getFileSize (const char *name)
 Return size of the file in bytes.
void getGMTime (char *str)
 Return the current GMT time in string format.
void getLocalTime (char *str)
 Return the current time in string format as asctime_r.
FILE * getLogFile ()
const char * getLogFileName ()
char * getString (const char *buf)
 Extract a string from the given buf.
const char * getToken (char *&str, const char *tok_chrs)
 Return a null-terminated string from the beginning of input string str.
int getVersionNumber ()
 Return an integer designating the version of this software.
const char * getVersionString ()
 Return a pointer to the string designating the version of this software.
long intersect (const std::vector< ibis::bitvector > &bits1, const std::vector< ibis::bitvector > &bits2, std::vector< ibis::bitvector > &res)
 Intersect two sets of bit vectors.
long intersect (const std::vector< ibis::bitvector > &bits1, const std::vector< ibis::bitvector > &bits2, const std::vector< ibis::bitvector > &bits3, std::vector< ibis::bitvector > &res)
 Intersect three sets of bit vectors.
char * itoa (int value, char *str, int)
void logMessage (const char *event, const char *fmt,...)
int makeDir (const char *dir)
 Recursively create the name directory.
template<typename F >
guardImpl0< F > makeGuard (F f)
template<typename F , typename A >
guardImpl1< F, A > makeGuard (F f, A a)
template<typename F , typename A1 , typename A2 >
guardImpl2< F, A1, A2 > makeGuard (F f, A1 a1, A2 a2)
template<class C , typename F >
guardObj0< C, F > objectGuard (C o, F f)
const ibis::bitvector64outerProduct (const ibis::bitvector &a, const ibis::bitvector &b, ibis::bitvector64 &c)
 Compute the outer product of a and b, add the result to c.
const ibis::bitvector64outerProductUpper (const ibis::bitvector &a, const ibis::bitvector &b, ibis::bitvector64 &c)
 Add the strict upper triangular portion of the outer production between a and b to c.
double rand ()
 A pseudo-random number generator (0,1).
int64_t read (int, void *, int64_t)
 A wrapper over POSIX read function.
int readDouble (double &val, const char *&str, const char *del=ibis::util::delimiters)
 Attempt to convert the incoming string into a double.
int readInt (int64_t &val, const char *&str, const char *del=ibis::util::delimiters)
 Attempt to convert the incoming string into an integer.
int readString (std::string &str, const char *&buf, const char *delim=0)
 Copy the next string to the output variable str.
int readUInt (uint64_t &val, const char *&str, const char *del=ibis::util::delimiters)
 Attempt to convert the incoming string into a unsigned integer.
template<class T >
refHolder< T > ref (T &r)
 A function template to produce refHolder.
void removeDir (const char *name, bool leaveDir=false)
 Remove the content of named directory.
void removeTail (char *str, char tail)
 Remove trailing character 'tail' from str.
template<typename T >
void reorder (array_t< T > &arr, const array_t< uint32_t > &ind)
 Reorder the array arr according to the indices given in ind.
void reorder (std::vector< std::string > &arr, const array_t< uint32_t > &ind)
 Reorder string values.
template<typename T >
void reorder (array_t< T * > &arr, const array_t< uint32_t > &ind)
 Reorder the array arr according to the indices given in ind.
void secondsToString (const time_t, char *str)
uint32_t serialNumber ()
 Return an integer that is always increasing.
int setLogFileName (const char *filename)
void setVerboseLevel (int v)
 Set the verboseness level.
template<typename T1 , typename T2 >
void sort_heap (array_t< T1 > &keys, array_t< T2 > &vals)
 Heapsort.
template<typename T1 , typename T2 >
void sort_insertion (array_t< T1 > &keys, array_t< T2 > &vals)
 Insertion sort.
template<typename T1 , typename T2 >
uint32_t sort_partition (array_t< T1 > &keys, array_t< T2 > &vals)
 Partition function for quicksort.
template<typename T1 , typename T2 >
void sort_partition3 (array_t< T1 > &keys, array_t< T2 > &vals, uint32_t &starteq, uint32_t &startgt)
 Three-way partitioning algorithm for quicksort.
template<typename T1 , typename T2 >
void sort_quick (array_t< T1 > &keys, array_t< T2 > &vals, uint32_t lvl)
 Quicksort.
template<typename T1 , typename T2 >
void sort_quick3 (array_t< T1 > &keys, array_t< T2 > &vals)
 Quicksort.
template<typename T1 , typename T2 >
void sort_shell (array_t< T1 > &keys, array_t< T2 > &vals)
 Shell sort.
template<typename T1 , typename T2 >
void sortAll (array_t< T1 > &arr1, array_t< T2 > &arr2)
 Sort two arrays together.
template<typename T1 , typename T2 >
void sortAll_quick (array_t< T1 > &arr1, array_t< T2 > &arr2)
 Quick sort. Uses both arrays as keys and Moves all records of both arrays.
template<typename T1 , typename T2 >
void sortAll_shell (array_t< T1 > &arr1, array_t< T2 > &arr2)
 Shell sort. Sort both arrays arr1 and arr2.
template<typename T1 , typename T2 >
uint32_t sortAll_split (array_t< T1 > &arr1, array_t< T2 > &arr2)
 The parititioning function for ibis::util::sortAll.
template<typename T1 , typename T2 >
void sortKeys (array_t< T1 > &keys, array_t< T2 > &vals)
 Sorting function with payload.
int64_t sortMerge (std::vector< std::string > &valR, array_t< uint32_t > &indR, std::vector< std::string > &valS, array_t< uint32_t > &indS)
 An in-memory sort merge join function with string values.
template<typename T >
int64_t sortMerge (array_t< T > &valR, array_t< uint32_t > &indR, array_t< T > &valS, array_t< uint32_t > &indS)
 An in-memory sort merge join function.
template<typename T >
int64_t sortMerge (array_t< T > &valR, array_t< uint32_t > &indR, array_t< T > &valS, array_t< uint32_t > &indS, double delta1, double delta2)
 An in-memory sort merge join function.
void sortStrings (std::vector< std::string > &keys, array_t< uint32_t > &vals)
 Sorting function with string as keys and uint32_t as payload.
void sortStrings (array_t< const char * > &keys, array_t< uint32_t > &vals)
 Sorting function with string as keys and uint32_t as payload.
uint32_t sortStrings_partition (std::vector< std::string > &keys, array_t< uint32_t > &vals, uint32_t begin, uint32_t end)
 The partitioning procedure for quick sort.
uint32_t sortStrings_partition (array_t< const char * > &keys, array_t< uint32_t > &vals, uint32_t begin, uint32_t end)
 The partitioning procedure for quick sort.
void sortStrings_quick (std::vector< std::string > &keys, array_t< uint32_t > &vals, uint32_t begin, uint32_t end)
 Quicksort for strings.
void sortStrings_quick (array_t< const char * > &keys, array_t< uint32_t > &vals, uint32_t begin, uint32_t end)
 Quicksort for strings.
void sortStrings_shell (std::vector< std::string > &keys, array_t< uint32_t > &vals, uint32_t begin, uint32_t end)
 Shell sorting procedure.
void sortStrings_shell (array_t< const char * > &keys, array_t< uint32_t > &vals, uint32_t begin, uint32_t end)
 Shell sorting procedure.
bool strMatch (const char *str, const char *pat)
 Match the string str against a simple pattern pat.
char * strnewdup (const char *s)
 Duplicate string content with C++ default new operator.
char * strnewdup (const char *s, const uint32_t n)
 Duplicate no more than n characters.
char * trim (char *str)
 Remove leading and trailing blank space.
void uniformFraction (const long unsigned idx, long unsigned &denominator, long unsigned &numerator)
void updateDatasets (void)
 Update the metadata about the data partitions.
const char * userName ()
 Return the user name.
int writeLogFileHeader (FILE *fptr, const char *fname)
uint32_t checksum (const char *str, uint32_t sz)
 Fletcher's checksum on two integers. Returns an integer.
uint32_t checksum (uint32_t a, uint32_t b)
 Fletcher's checksum on two integers. Returns an integer.
std::string shortName (const std::string &longname)
 Fletcher's checksum on two integers. Returns an integer.
std::string randName (const std::string &longname)
 Fletcher's checksum on two integers. Returns an integer.
void int2string (std::string &str, unsigned val)
void int2string (std::string &str, unsigned v1, unsigned v2)
void int2string (std::string &str, unsigned v1, unsigned v2, unsigned v3)
void int2string (std::string &str, const std::vector< unsigned > &val)
void encode64 (uint64_t, std::string &)
int decode64 (uint64_t &, const std::string &)
int decode16 (uint64_t &, const char *)
std::string groupby1000 (uint64_t)
double incrDouble (const double &)
 Functions to handle manipulation of floating-point numbers.
double decrDouble (const double &)
 Decrease the input value to the next smaller value.
void eq2range (const double &, double &, double &)
 Generate a range [left, right) that contains exactly the input value in.
double coarsen (const double in, unsigned prec=2)
 Reduce the decimal precision of the incoming floating-point value to specified precision.
double compactValue (double left, double right, double start=0.0)
 Compute a compact 64-bit floating-point value with a short decimal representation.
double compactValue2 (double left, double right, double start=0.0)
 Compute a compact 64-bit floating-point value with a short binary representation.
void setNaN (double &val)
 Set a double to NaN.
void setNaN (float &val)
 Functions to handle manipulation of floating-point numbers.
template<typename Tin , typename Tout >
void round_down (const Tin &inval, Tout &outval)
 Round the incoming value to the largest output value that is no more than the input.
template<typename Tin , typename Tout >
void round_up (const Tin &inval, Tout &outval)
 Round the incoming value to the smallest output value that is no less than the input.
template<typename Tin >
void round_up (const Tin &inval, float &)
 A specialization of round_up for the output type float.
template<typename Tin >
void round_up (const Tin &inval, double &outval)
 A specialization of round_up for the output in double.
int log2 (uint32_t x)
 Log_2 of a 32-bit integer.
int log2 (uint64_t x)
 Log_2 of a 64-bit integer.
template<typename T >
void sort_radix (array_t< char > &keys, array_t< T > &vals)
 LSD Radix sort.
template<typename T >
void sort_radix (array_t< signed char > &keys, array_t< T > &vals)
 LSD Radix sort.
template<typename T >
void sort_radix (array_t< unsigned char > &keys, array_t< T > &vals)
 LSD Radix sort.
template<typename T >
void sort_radix (array_t< uint16_t > &keys, array_t< T > &vals)
 LSD Radix sort.
template<typename T >
void sort_radix (array_t< int16_t > &keys, array_t< T > &vals)
 LSD Radix sort.
template<typename T >
void sort_radix (array_t< uint32_t > &keys, array_t< T > &vals)
 LSD Radix sort.
template<typename T >
void sort_radix (array_t< int32_t > &keys, array_t< T > &vals)
 LSD Radix sort.
template<typename T >
void sort_radix (array_t< uint64_t > &keys, array_t< T > &vals)
 LSD Radix sort.
template<typename T >
void sort_radix (array_t< int64_t > &keys, array_t< T > &vals)
 LSD Radix sort.
template<typename T >
void sort_radix (array_t< float > &keys, array_t< T > &vals)
 LSD Radix sort.
template<typename T >
void sort_radix (array_t< double > &keys, array_t< T > &vals)
 LSD Radix sort.
void sortRIDs (ibis::RIDSet &)
 Sort RID lists.
void sortRIDsq (ibis::RIDSet &, uint32_t, uint32_t)
 Sort a portion of the RIDSet with quick sort.
void sortRIDsi (ibis::RIDSet &, uint32_t, uint32_t)
 Sort a portion of the RIDset with insertion sort.

Variables

const short unsigned charIndex []
 charIndex maps the characters (ASCII) back to integer [0-64]
const char * charTable = "-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz~"
 charTable lists the 64 printable characters to be used for names
const char * delimiters = ";, \v\b\f\r\t\n'\""
 Delimiters used to separate a string of names.
pthread_mutex_t envLock = PTHREAD_MUTEX_INITIALIZER
 A mutex for serialize operations FastBit wide.
const int log2table [256]
 log base 2 of an integer, the lookup table
const uint32_t shellgaps [16]
 Gaps for Shell sort from http://www.cs.princeton.edu/~rs/shell/shell.c by R.

Detailed Description

Organize the miscellaneous functions under the name util.


Typedef Documentation

typedef const guardBase& ibis::util::guard

The type to be used by client code.

User code uses type ibis::util::guard along with the overloaded function ibis::util::makeGuard, as in

 ibis::util::guard myguard = ibis::util::makeGuard...;
Note:
The parameters passed to the function that does the actual clean up jobs are taken as they are at the construction time. If such parameters are modified, the caller needs to either create the guard variable after the parameters take on their final values, or dismiss the old gard and create another one.

Function Documentation

double ibis::util::coarsen ( const double  in,
unsigned  prec = 2 
) [inline]

Reduce the decimal precision of the incoming floating-point value to specified precision.

In an attempt to compute small values more consistently, small values are computed through division of integer values.

Since these integer values are computed through the function pow, the accuracy of the results depend on the implementation of the math library.

The value zero is always rounded to zero. Incoming value less than 1E-300 or greater than 1E300 is rounded to zero.

Referenced by ibis::fuzz::append(), ibis::bylt::append(), ibis::zona::append(), ibis::fuge::append(), ibis::bak::mapValues(), ibis::bak2::mapValues(), ibis::tafel::preferredSize(), ibis::tafel::readCSV(), and ibis::tafel::readSQLDump().

double ibis::util::compactValue ( double  left,
double  right,
double  start = 0.0 
)
double ibis::util::compactValue2 ( double  left,
double  right,
double  start = 0.0 
)

Compute a compact 64-bit floating-point value with a short binary representation.

Referenced by ibis::part::equalWeightBins(), and ibis::part::get2DDistributionU().

int ibis::util::copy ( const char *  to,
const char *  from 
)
double ibis::util::decrDouble ( const double &  in) [inline]

Decrease the input value to the next smaller value.

See also:
ibis::util::incrDouble

Referenced by ibis::quaere::create().

void ibis::util::eq2range ( const double &  in,
double &  left,
double &  right 
) [inline]

Generate a range [left, right) that contains exactly the input value in.

This is used to transform an expression expression "A = in" into "left <= A < right".

See also:
ibis::util::incrDouble

Referenced by ibis::ambit::estimate(), ibis::pale::estimate(), ibis::pack::estimate(), and ibis::zone::estimate().

unsigned ibis::util::gatherParts ( ibis::partList tlist,
const char *  adir,
const char *  bdir,
bool  ro = false 
)

Look for data partitions in the given pair of directories.

Read the two directories, if there are matching subdirs, construct an ibis::part from them.

Will descend into the subdirectories when run on unix systems to look for matching subdirectories.

Returns the number of data partitions found.

References envLock, FASTBIT_DIRSEP, ibis::gVerbose, ibis::part::name(), ibis::part::nColumns(), ibis::part::nRows(), ibis::part::part(), ibis::part::rename(), and ibis::part::timestamp().

Referenced by ibis::mensa::addPartition(), gatherParts(), ibis::init(), and ibis::mensa::mensa().

unsigned ibis::util::gatherParts ( ibis::partList tlist,
const char *  dir1,
bool  ro = false 
)

Look for data partitions in the given directory.

Examining the given directory to look for the metadata files and to construct ibis::part.

Can only descend into subdirectories through opendir family of functions.

Returns the number of data partitions found.

References envLock, FASTBIT_DIRSEP, gatherParts(), ibis::gVerbose, ibis::part::name(), ibis::part::nColumns(), ibis::part::nRows(), ibis::part::part(), ibis::part::rename(), and ibis::part::timestamp().

unsigned ibis::util::gatherParts ( ibis::partList tables,
const ibis::resource res,
bool  ro = false 
)

Reconstruct partitions using data directories specified in the resource list.

Read the parameters dataDir1 and dataDir2 to build data partitions.

Returns the number of data partitions found.

References gatherParts(), and ibis::resource::getValue().

off_t ibis::util::getFileSize ( const char *  name)
char * ibis::util::getString ( const char *  buf)

Extract a string from the given buf.

Remove leading and trailing spaces and surrounding quotes. Returns a copy of the string allocated with the new operator.

References ibis::gVerbose, and strnewdup().

Referenced by ibis::mensa::getColumnAsStrings(), ibis::text::IDColumnForKeywordIndex(), ibis::part::readMetaData(), and ibis::part::readMetaTags().

const char * ibis::util::getToken ( char *&  str,
const char *  tok_chrs 
)

Return a null-terminated string from the beginning of input string str.

The first apparence of any character from characters in tok_chars is turned into null. The incoming argument is modified to point to the first character that is not in tok_chrs. If no character in tok_chrs is found, str is returned and the first argument is changed to null.

int ibis::util::getVersionNumber ( ) [inline]

Return an integer designating the version of this software.

The version number is composed of four segments each with two decimal digits. For example, version 1.3.0.2 will be represented as 1030002. The stable releases typically have the last segment as zero, which is generally referred to without the last ".0".

Referenced by fastbit_get_version_number().

const char* ibis::util::getVersionString ( ) [inline]

Return a pointer to the string designating the version of this software.

Referenced by fastbit_get_version_string().

double ibis::util::incrDouble ( const double &  in) [inline]

Functions to handle manipulation of floating-point numbers.

Increment the input value to the next larger value.

If the math library has nextafter, it will use nextafter, otherwise, it will use the unit round-off error to compute the next larger value. The success of this computation is highly sensitive to the definition of DBL_EPSILON, which should be the smallest value x such that (1+x) is different from x. For 64-bit IEEE floating-point number, it is approximately 2.2E-16 (2^{-52}).

Referenced by ibis::part::adaptive2DBins(), ibis::part::adaptive3DBins(), ibis::part::adaptiveFloats(), ibis::part::adaptiveFloatsDetailed(), ibis::bak::binBoundaries(), ibis::quaere::create(), ibis::ambit::estimate(), ibis::pale::estimate(), ibis::pack::estimate(), ibis::zone::estimate(), ibis::part::get1DDistribution(), ibis::part::get2DDistributionI(), ibis::part::get2DDistributionU(), ibis::qContinuousRange::qContinuousRange(), and ibis::bin::scanAndPartition().

long ibis::util::intersect ( const std::vector< ibis::bitvector > &  bits1,
const std::vector< ibis::bitvector > &  bits2,
std::vector< ibis::bitvector > &  res 
)

Intersect two sets of bit vectors.

    res[jj*bits2.size()+ii] = bits1[jj] & bits2[ii]

References ibis::fileManager::bytesInUse(), ibis::bitvector::compress(), ibis::gVerbose, and ibis::fileManager::instance().

Referenced by ibis::part::get2DBins(), and ibis::part::get3DBins().

long ibis::util::intersect ( const std::vector< ibis::bitvector > &  bits1,
const std::vector< ibis::bitvector > &  bits2,
const std::vector< ibis::bitvector > &  bits3,
std::vector< ibis::bitvector > &  res 
)

Intersect three sets of bit vectors.

    res[(kk*bits2.size()+jj)*bits3.size()+ii] = bits1[kk] & bits2[jj] & bits3[ii]

References ibis::fileManager::bytesInUse(), ibis::bitvector::compress(), ibis::gVerbose, and ibis::fileManager::instance().

int ibis::util::makeDir ( const char *  dir)

Recursively create the name directory.

Recursivly create directory "dir".

Returns zero (0) to indicate success, a negative number to indicate error. If the directory already exists, it immediately returns 0.

References FASTBIT_DIRSEP, and strnewdup().

Referenced by ibis::bord::backup(), and ibis::part::part().

Compute the outer product of a and b, add the result to c.

This implementation only uses public functions of ibis::bitvector and ibis::bitvector64.

This should make it possible to use both WAH and BBC compress bitvector classes. It is not likely that we can gain much performance by directly using member variables of ibit::bitvector because the unit of compressed data from ibit::vector do not fit neatly into ibis::bitvector64::word_t.

Return a const reference of c. If the input does not have the correct size, it will be replaced by the outer product.

References ibis::bitvector64::adjustSize(), ibis::bitvector64::appendFill(), ibis::bitvector64::bytes(), ibis::bitvector64::cnt(), ibis::bitvector::cnt(), ibis::gVerbose, ibis::bitvector::indexSet::nIndices(), ibis::bitvector64::setBit(), ibis::bitvector64::size(), ibis::bitvector::size(), and ibis::horometer::start().

Referenced by ibis::index::estimate(), ibis::query::processJoin(), ibis::relic::speedTest(), and ibis::bin::speedTest().

Add the strict upper triangular portion of the outer production between a and b to c.

The result contains only the strict upper triangular portion of the full outer product.

References ibis::bitvector64::adjustSize(), ibis::bitvector64::appendFill(), ibis::bitvector64::bytes(), ibis::bitvector64::cnt(), ibis::bitvector::cnt(), ibis::gVerbose, ibis::bitvector::indexSet::nIndices(), ibis::bitvector64::setBit(), ibis::bitvector64::size(), ibis::bitvector::size(), and ibis::horometer::start().

double ibis::util::rand ( ) [inline]

A pseudo-random number generator (0,1).

A Linear Congruential Generator of pseudo-random numbers.

It produces a floating-point in the range of [0, 1). It is very simple and fast, however, it does not produce high-quality random numbers. It may not be thread-safe. Since the actual computation only involves two arithmetic operations, it is very unlikely to have thread-safty issues.

Referenced by ibis::part::buildQueryList(), ibis::index::mapValues(), ibis::part::queryTest(), ibis::part::quickTest(), ibis::part::selfTest(), and ibis::part::testRangeOperators().

int64_t ibis::util::read ( int  fdes,
void *  buf,
int64_t  nbytes 
)

A wrapper over POSIX read function.

When a large chunk is requested by the user, the read function may return one piece at a time, typically a piece is no larger than 2^31 bytes. However, this maximum chunk size is not consistently defined to be SSIZE_MAX as the POSIX standard demands. This function attempts use the return value from read when it is a posive value.

References ibis::gVerbose.

Referenced by ibis::fileManager::roFile::doRead(), ibis::fileManager::storage::read(), ibis::range::read(), ibis::ambit::read(), ibis::pale::read(), ibis::pack::read(), ibis::zone::read(), ibis::fuge::read(), ibis::bak::read(), and ibis::bak2::read().

int ibis::util::readDouble ( double &  val,
const char *&  str,
const char *  del = ibis::util::delimiters 
)

Attempt to convert the incoming string into a double.

The format recodnized is the following

    [+-]?\d*\.\d*[[eE][+-]?\d+]
Note:
the decimal point is the period (.)! This function does not understand any other notation.
This function leaves str at the first character that does not follow the above pattern.

References readInt().

Referenced by ibis::resource::getNumber(), and ibis::tafel::parseLine().

int ibis::util::readInt ( int64_t &  val,
const char *&  str,
const char *  del = ibis::util::delimiters 
)

Attempt to convert the incoming string into an integer.

It skips leading space and converts an optional +/- sign followed by a list of decimal digits to an integer. On successful completion of this function, the argument val contains the converted value and str points to the next unused character in the input string. In this case, it returns 0. It returns -1 if the input string is a null string, is an empty string, has only blank spaces or have one of the delimiters immediately following the leading blank spaces. It returns -2 to indicate overflow, in which case, it resets val to 0 and moves str to the next character that is not a decimal digit.

References ibis::gVerbose, and readUInt().

Referenced by ibis::tafel::parseLine(), ibis::qIntHod::qIntHod(), and readDouble().

int ibis::util::readString ( std::string &  str,
const char *&  buf,
const char *  delim = 0 
)

Copy the next string to the output variable str.

Leading blank spaces are skipped. The content of str will be empty if buf is nil or an empty string. If the string is quoted, only spaces before the quote is skipped, and the content of the string will be everything after the first quote to the last character before the matching quote or end of buffer. If delim is not provided (i.e., is 0), and the 1st nonblank character is not a quote, then string will terminate at the 1st space character following the nonblank character.

Note:
A unquoted empty string is considered a null value which is indicated by a negative return value. A quoted empty string is a valid string, which is indicated by a return value of 0.
This function uses backslash as the escape character for allowing quotes to be inside the quoted strings. Unfortunately, this also means to have a single backslash in a string, one has to input two of them right next to each other.
Input strings starting with an apostrophe, such as "'twas", must be quoted as "\"'twas"" or the apostrophe must be escaped as "\'twas". Otherwise, the leading apostrophe would be interested as a unmatched quote which will cause the next string value to extend beyond the intended word.

References ibis::gVerbose.

Referenced by ibis::tafel::assignDefaultValue(), ibis::category::getString(), ibis::tafel::parseLine(), ibis::tablex::readNamesAndTypes(), ibis::tafel::readSQLDump(), ibis::text::selectStrings(), ibis::bin::setBoundaries(), and ibis::tafel::SQLCreateTable().

int ibis::util::readUInt ( uint64_t &  val,
const char *&  str,
const char *  del = ibis::util::delimiters 
)

Attempt to convert the incoming string into a unsigned integer.

It skips leading space and converts a list of decimal digits to an integer. On successful completion of this function, the argument val contains the converted value and str points to the next unused character in the input string. In this case, it returns 0. It returns -1 if the input string is a null string, is an empty string, has only blank spaces or have one of the delimiters immediately following the leading blank spaces. It returns -2 to indicate overflow, in which case, it resets val to 0 and moves str to the next character that is not a decimal digit.

References ibis::gVerbose.

Referenced by ibis::tafel::parseLine(), ibis::qUIntHod::qUIntHod(), readInt(), and ibis::keywords::readTDLine().

void ibis::util::removeDir ( const char *  name,
bool  leaveDir = false 
)

Remove the content of named directory.

If this function is run on a unix-type system and the second argument is true, it will leave all the subdirectories intact as well.

The directory itself is removed unless the second argument is true.

References FASTBIT_DIRSEP, ibis::gVerbose, and strnewdup().

Referenced by ibis::part::append2(), ibis::query::clear(), ibis::part::doBackup(), ibis::part::part(), and ibis::part::rollback().

template<typename T >
void ibis::util::reorder ( array_t< T > &  arr,
const array_t< uint32_t > &  ind 
)
void ibis::util::reorder ( std::vector< std::string > &  arr,
const array_t< uint32_t > &  ind 
)

Reorder string values.

This function keeps the actual strings in their input positions by using the function swap. This procedure should avoid most of the memory allocations.

References ibis::gVerbose, and ibis::array_t< T >::size().

template<typename Tin , typename Tout >
void ibis::util::round_down ( const Tin &  inval,
Tout &  outval 
)

Round the incoming value to the largest output value that is no more than the input.

Both Tin and Tout must be elementary data types, and Tout must be an elementary integral type.

Referenced by ibis::part::doCount(), and ibis::part::doScan().

template<typename Tin , typename Tout >
void ibis::util::round_up ( const Tin &  inval,
Tout &  outval 
)

Round the incoming value to the smallest output value that is no less than the input.

Both Tin and Tout must be elementary data types, and Tout must be an elementary integral type.

Referenced by ibis::column::searchSortedICC(), and ibis::column::searchSortedOOCC().

template<typename Tin >
void ibis::util::round_up ( const Tin &  inval,
float &  outval 
) [inline]

A specialization of round_up for the output type float.

This function uses nextafterf if the macro HAVE_NEXTAFTER is defined, otherwise it uses FLT_EPSILON to compute outval as (float)(inval)*(1+FLT_EPSILON).

void ibis::util::setNaN ( float &  val)

Functions to handle manipulation of floating-point numbers.

Increment the input value to the next larger value.

If the math library has nextafter, it will use nextafter, otherwise, it will use the unit round-off error to compute the next larger value. The success of this computation is highly sensitive to the definition of DBL_EPSILON, which should be the smallest value x such that (1+x) is different from x. For 64-bit IEEE floating-point number, it is approximately 2.2E-16 (2^{-52}).

void ibis::util::setVerboseLevel ( int  v) [inline]

Set the verboseness level.

Unless the code is compiled with DEBUG macro set, the default verboseness level is 0, which will only print out information about major errors.

References ibis::gVerbose.

template<typename T1 , typename T2 >
void ibis::util::sort_heap ( array_t< T1 > &  keys,
array_t< T2 > &  vals 
)

Heapsort.

Sort the keys only. Move the vals along with the keys.

References ibis::gVerbose, and ibis::array_t< T >::size().

Referenced by sort_quick().

template<typename T1 , typename T2 >
void ibis::util::sort_insertion ( array_t< T1 > &  keys,
array_t< T2 > &  vals 
)

Insertion sort.

It has relatively straightforward memory access pattern and may be useful to sort a few numbers at the end of a recursive procedure.

References ibis::gVerbose, and ibis::array_t< T >::size().

template<typename T1 , typename T2 >
uint32_t ibis::util::sort_partition ( array_t< T1 > &  keys,
array_t< T2 > &  vals 
)

Partition function for quicksort.

The return value p separates keys into two parts, keys[..:p-1] < keys[p:..]. A return value equal to the size of keys indicates all keys are sorted.

References ibis::gVerbose, ibis::array_t< T >::size(), and sort_shell().

Referenced by sort_quick().

template<typename T1 , typename T2 >
void ibis::util::sort_partition3 ( array_t< T1 > &  keys,
array_t< T2 > &  vals,
uint32_t &  starteq,
uint32_t &  startgt 
)

Three-way partitioning algorithm for quicksort.

Upon return from this function, keys satisfying the following order keys[0:starteq] < keys[starteq:stargt-1] < keys[startgt:..]. The keys are ordered if starteq = startgt = keys.size().

References ibis::gVerbose, ibis::array_t< T >::size(), and sort_shell().

Referenced by sort_quick3().

template<typename T1 , typename T2 >
void ibis::util::sort_quick ( array_t< T1 > &  keys,
array_t< T2 > &  vals,
uint32_t  lvl 
)

Quicksort.

Quick sort with introspection.

Sort the keys only. Use the standard two-way partitioning.

It will switch to heap sort after FASTBIT_QSORT_MAX_DEPTH levels of recursion. Performs recursive call only on the smaller half, while iterate over the larger half.

References ibis::gVerbose, ibis::array_t< T >::size(), sort_heap(), sort_partition(), sort_quick(), and sort_shell().

Referenced by sort_quick(), and sortKeys().

template<typename T1 , typename T2 >
void ibis::util::sort_quick3 ( array_t< T1 > &  keys,
array_t< T2 > &  vals 
)

Quicksort.

Sort the keys only. Use a nonstandard three-way partitioning.

References ibis::gVerbose, ibis::array_t< T >::size(), sort_partition3(), sort_quick3(), and sort_shell().

Referenced by sort_quick3().

template<typename T >
void ibis::util::sort_radix ( array_t< char > &  keys,
array_t< T > &  vals 
)

LSD Radix sort.

Allocates buffers needed for copying data.

References ibis::gVerbose, ibis::array_t< T >::size(), and ibis::array_t< T >::swap().

Referenced by sortKeys().

template<typename T >
void ibis::util::sort_radix ( array_t< signed char > &  keys,
array_t< T > &  vals 
)

LSD Radix sort.

Allocates buffers needed for copying data.

References ibis::gVerbose, ibis::array_t< T >::size(), and ibis::array_t< T >::swap().

template<typename T >
void ibis::util::sort_radix ( array_t< unsigned char > &  keys,
array_t< T > &  vals 
)

LSD Radix sort.

Allocates buffers needed for copying data.

References ibis::gVerbose, ibis::array_t< T >::size(), and ibis::array_t< T >::swap().

template<typename T >
void ibis::util::sort_radix ( array_t< uint16_t > &  keys,
array_t< T > &  vals 
)

LSD Radix sort.

Allocates buffers needed for copying data.

References ibis::gVerbose, ibis::array_t< T >::size(), and ibis::array_t< T >::swap().

template<typename T >
void ibis::util::sort_radix ( array_t< int16_t > &  keys,
array_t< T > &  vals 
)

LSD Radix sort.

Allocates buffers needed for copying data.

References ibis::gVerbose, ibis::array_t< T >::size(), and ibis::array_t< T >::swap().

template<typename T >
void ibis::util::sort_radix ( array_t< uint32_t > &  keys,
array_t< T > &  vals 
)

LSD Radix sort.

Allocates buffers needed for copying data.

References ibis::gVerbose, ibis::array_t< T >::size(), and ibis::array_t< T >::swap().

template<typename T >
void ibis::util::sort_radix ( array_t< int32_t > &  keys,
array_t< T > &  vals 
)

LSD Radix sort.

Allocates buffers needed for copying data.

References ibis::gVerbose, ibis::array_t< T >::size(), and ibis::array_t< T >::swap().

template<typename T >
void ibis::util::sort_radix ( array_t< uint64_t > &  keys,
array_t< T > &  vals 
)

LSD Radix sort.

Allocates buffers needed for copying data.

References ibis::gVerbose, ibis::array_t< T >::size(), and ibis::array_t< T >::swap().

template<typename T >
void ibis::util::sort_radix ( array_t< int64_t > &  keys,
array_t< T > &  vals 
)

LSD Radix sort.

Allocates buffers needed for copying data.

References ibis::gVerbose, ibis::array_t< T >::size(), and ibis::array_t< T >::swap().

template<typename T >
void ibis::util::sort_radix ( array_t< float > &  keys,
array_t< T > &  vals 
)

LSD Radix sort.

Allocates buffers needed for copying data.

References ibis::gVerbose, ibis::array_t< T >::size(), and ibis::array_t< T >::swap().

template<typename T >
void ibis::util::sort_radix ( array_t< double > &  keys,
array_t< T > &  vals 
)

LSD Radix sort.

Allocates buffers needed for copying data.

References ibis::gVerbose, ibis::array_t< T >::size(), and ibis::array_t< T >::swap().

template<typename T1 , typename T2 >
void ibis::util::sort_shell ( array_t< T1 > &  keys,
array_t< T2 > &  vals 
)

Shell sort.

It has relatively straightforward memory access pattern and may be useful to sort a few numbers at the end of a recursive sorting function.

References ibis::gVerbose, shellgaps, and ibis::array_t< T >::size().

Referenced by sort_partition(), sort_partition3(), sort_quick(), and sort_quick3().

template<typename T1 , typename T2 >
void ibis::util::sortAll ( array_t< T1 > &  arr1,
array_t< T2 > &  arr2 
)

Sort two arrays together.

Order arr1 in ascending order first, then when arr1 has the same value, order arr2 in ascending order as well.

References ibis::array_t< T >::nosharing(), ibis::array_t< T >::size(), sortAll_quick(), and sortAll_shell().

template<typename T1 , typename T2 >
uint32_t ibis::util::sortAll_split ( array_t< T1 > &  arr1,
array_t< T2 > &  arr2 
)

The parititioning function for ibis::util::sortAll.

Uses the standard two-way partitioning.

References ibis::gVerbose, and ibis::array_t< T >::size().

Referenced by sortAll_quick().

template<typename T1 , typename T2 >
void ibis::util::sortKeys ( array_t< T1 > &  keys,
array_t< T2 > &  vals 
)

Sorting function with payload.

Sort keys in ascending order, move the vals accordingly.

References ibis::array_t< T >::nosharing(), ibis::array_t< T >::size(), sort_quick(), and sort_radix().

Referenced by ibis::direkte::keys(), sortMerge(), and ibis::bord::sortValues().

template<typename T >
int64_t ibis::util::sortMerge ( array_t< T > &  valR,
array_t< uint32_t > &  indR,
array_t< T > &  valS,
array_t< uint32_t > &  indS 
)

An in-memory sort merge join function.

Sort the input arrays, valR and valS. Count the number of results from join.

Note:
This implementation is for elementary numberical data types only.
On input, if the size of indR is the same as that of valR, its content is preserved, otherwise it is reset to 0..valR.size()-1. The array indS is similarly set to 0..valS.size()-1 if its size is different from that of valS.

References ibis::gVerbose, ibis::array_t< T >::nosharing(), ibis::array_t< T >::resize(), ibis::array_t< T >::size(), and sortKeys().

template<typename T >
int64_t ibis::util::sortMerge ( array_t< T > &  valR,
array_t< uint32_t > &  indR,
array_t< T > &  valS,
array_t< uint32_t > &  indS,
double  delta1,
double  delta2 
)

An in-memory sort merge join function.

Sort the input arrays, valR and valS. Count the number of results satisfying ValR-Vals between delta1 and delta2.

Note:
This implementation is for elementary numberical data types only.
On input, if the size of indR is the same as that of valR, its content is preserved, otherwise it is reset to 0..valR.size()-1. The array indS is similarly set to 0..valS.size()-1 if its size is different from that of valS.

References ibis::gVerbose, ibis::array_t< T >::nosharing(), ibis::array_t< T >::resize(), ibis::array_t< T >::size(), and sortKeys().

Sort RID lists.

None of them are stable.

Sort the given list of RIDs with quick sort.

References ibis::array_t< T >::nosharing(), ibis::array_t< T >::size(), sortRIDsi(), and sortRIDsq().

Referenced by ibis::part::evaluateRIDSet(), and ibis::bundle::sortRIDs().

void ibis::util::sortRIDsi ( ibis::RIDSet rids,
uint32_t  i,
uint32_t  j 
)

Sort a portion of the RIDset with insertion sort.

Sort RIDs in the range of [i, j).

Referenced by sortRIDs(), and sortRIDsq().

void ibis::util::sortRIDsq ( ibis::RIDSet rids,
uint32_t  i,
uint32_t  j 
)

Sort a portion of the RIDSet with quick sort.

Sort RIDs in the range of [i, j).

References sortRIDsi().

Referenced by sortRIDs().

void ibis::util::sortStrings ( std::vector< std::string > &  keys,
array_t< uint32_t > &  vals 
)

Sorting function with string as keys and uint32_t as payload.

It uses quick sort if there are more than FASTBIT_QSORT_MIN elements to sort, otherwise, it uses shell sort.

Note:
This function operates completely in-memory; all arrays and whatever auxiliary data must fit in memory. Furthermore, FastBit does not track the memory usage of std::vector nor std::string. In case this function runs out of memory, the two input arrays are left in undefined states. Normally, one should not have lost any data values, but the values are in undetermined orders. However, this is not guaranteed to be the case.

References ibis::gVerbose, ibis::array_t< T >::size(), sortStrings_quick(), and sortStrings_shell().

Referenced by ibis::dictionary::dictionary(), ibis::dictionary::readRaw(), ibis::bord::reorder(), sortMerge(), and ibis::bord::sortStrings().

void ibis::util::sortStrings ( ibis::array_t< const char * > &  keys,
array_t< uint32_t > &  vals 
)

Sorting function with string as keys and uint32_t as payload.

It uses quick sort if there are more than FASTBIT_QSORT_MIN elements to sort, otherwise, it uses shell sort.

Note:
This function operates completely in-memory; all arrays and whatever auxiliary data must fit in memory.

References ibis::array_t< T >::size(), sortStrings_quick(), and sortStrings_shell().

uint32_t ibis::util::sortStrings_partition ( std::vector< std::string > &  keys,
array_t< uint32_t > &  vals,
uint32_t  begin,
uint32_t  end 
)

The partitioning procedure for quick sort.

Median-of-3 partitioning algorithm.

It implements the standard two-way partitioning with the median-of-three pivot.

References ibis::gVerbose, ibis::array_t< T >::size(), and sortStrings_shell().

Referenced by sortStrings_quick().

uint32_t ibis::util::sortStrings_partition ( ibis::array_t< const char * > &  keys,
array_t< uint32_t > &  vals,
uint32_t  begin,
uint32_t  end 
)

The partitioning procedure for quick sort.

Median-of-3 partitioning algorithm.

It implements the standard two-way partitioning with the median-of-three pivot.

References ibis::gVerbose, ibis::array_t< T >::size(), and sortStrings_shell().

void ibis::util::sortStrings_quick ( std::vector< std::string > &  keys,
array_t< uint32_t > &  vals,
uint32_t  begin,
uint32_t  end 
)

Quicksort for strings.

Quick-sort for strings with shell sort as clean-up procedure.

References ibis::gVerbose, ibis::array_t< T >::size(), sortStrings_partition(), sortStrings_quick(), and sortStrings_shell().

Referenced by sortStrings(), and sortStrings_quick().

void ibis::util::sortStrings_quick ( ibis::array_t< const char * > &  keys,
array_t< uint32_t > &  vals,
uint32_t  begin,
uint32_t  end 
)

Quicksort for strings.

Quick-sort for strings with shell sort as clean-up procedure.

References ibis::gVerbose, ibis::array_t< T >::size(), sortStrings_partition(), sortStrings_quick(), and sortStrings_shell().

void ibis::util::sortStrings_shell ( std::vector< std::string > &  keys,
array_t< uint32_t > &  vals,
uint32_t  begin,
uint32_t  end 
)

Shell sorting procedure.

To clean up after the quick sort procedure.

References ibis::gVerbose, shellgaps, and ibis::array_t< T >::size().

Referenced by sortStrings(), sortStrings_partition(), and sortStrings_quick().

void ibis::util::sortStrings_shell ( ibis::array_t< const char * > &  keys,
array_t< uint32_t > &  vals,
uint32_t  begin,
uint32_t  end 
)

Shell sorting procedure.

To clean up after the quick sort procedure.

References ibis::gVerbose, shellgaps, and ibis::array_t< T >::size().

bool ibis::util::strMatch ( const char *  str,
const char *  pat 
)

Match the string str against a simple pattern pat.

The pattern may use two wild characters defined for SQL function LIKE, '_' and ''.

Referenced by ibis::part::matchNameValuePair(), ibis::dictionary::patternSearch(), and ibis::mensa::select2().

void ibis::util::updateDatasets ( void  )

Update the metadata about the data partitions.

Loop through all known data partitions and check for any update in the metadata files.

References ibis::datasets, and ibis::part::updateData().


Variable Documentation

const short unsigned ibis::util::charIndex
Initial value:
 {
    64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
    64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
    64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 62, 64,  0, 64, 64,
     1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 64, 64, 64, 63, 64, 64,
    64, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
    26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 64, 64, 64, 64, 37,
    64, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52,
    53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 64, 64, 64, 64,
}

charIndex maps the characters (ASCII) back to integer [0-64]

Maps back from ASCII to positions of the characters in ibis::util::charTable.

Referenced by ibis::query::isValidToken().

const char * ibis::util::charTable = "-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz~"

charTable lists the 64 printable characters to be used for names

A list of 65 printable ASCII characters that are not special to most of the command interpreters.

The first 64 of them are basically the same as specified in RFC 3548 for base-64 numbers, but appear in different order. A set of numbers represented in this base-64 representation will be sorted in the same order as their decimal representations.

const char * ibis::util::delimiters = ";, \v\b\f\r\t\n'\""
pthread_mutex_t ibis::util::envLock = PTHREAD_MUTEX_INITIALIZER

A mutex for serialize operations FastBit wide.

Currently it is used by the functions that generating user name, asking for password, backing up active tables, cleaning up the list of tables. It is also used extensively in the implementation of C API functions to ensure the cache maintained for C users are manipulated by one user at a time.

Referenced by ibis::fileManager::adjustCacheSize(), ibis::part::doBackup(), ibis::query::evaluate(), ibis::column::evaluateRange(), fastbit_add_values(), fastbit_cleanup(), fastbit_flush_buffer(), fastbit_init(), ibis::findDataset(), gatherParts(), ibis::init(), ibis::array_t< T >::nosharing(), and ibis::part::part().

Initial value:
 {
#define X(i) 
    -1, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3,
    X(4), X(5), X(5), X(6), X(6), X(6), X(6),
    X(7), X(7), X(7), X(7), X(7), X(7), X(7), X(7)

}

log base 2 of an integer, the lookup table

Referenced by log2().

const uint32_t ibis::util::shellgaps[16]
Initial value:
 {1, 3, 7, 21, 48, 112, 336, 861, 1968,
                                        4592, 12776, 33936, 86961, 198768,
                                        463792, 1391376}

Gaps for Shell sort from http://www.cs.princeton.edu/~rs/shell/shell.c by R.

Sedgewick.

Referenced by sort_shell(), sortAll_shell(), and sortStrings_shell().

Make It A Bit Faster
Contact us
Disclaimers
FastBit source code
FastBit mailing list archive