Public Member Functions | Protected Member Functions | Protected Attributes
ibis::dictionary Class Reference

Provide a dual-directional mapping between strings and integers. More...

#include <dictionary.h>

List of all members.

Public Member Functions

void clear ()
 Clear the allocated memory. Leave only the NULL entry.
void copy (const dictionary &rhs)
 Copy function. Use copy constructor and swap the content.
 dictionary (const dictionary &dic)
 Copy constructor. Places all the string in one contiguous buffer.
 dictionary ()
 Default constructor. Generates one (NULL) entry.
bool equal_to (const ibis::dictionary &) const
 Compare whether this dicrionary and the other are equal in content.
const char * find (const char *str) const
 Find the given string in the dictionary.
uint32_t insert (const char *str)
 Insert a string to the dictionary.
uint32_t insertRaw (char *str)
 Non-copying insert.
int merge (const dictionary &)
 Merge the incoming dictionary with this one.
int morph (const dictionary &, array_t< uint32_t > &) const
 Produce an array that mapps the integers in old dictionary to the new one.
const char * operator[] (uint32_t i) const
 Return a string corresponding to the integer.
uint32_t operator[] (const char *str) const
 Convert a string to its integer code.
void patternSearch (const char *pat, array_t< uint32_t > &matches) const
 Find all codes that matches the SQL LIKE pattern.
int read (const char *name)
 Read the content of the named file.
uint32_t size () const
 Return the number of valid (not null) strings in the dictionary.
void sort (array_t< uint32_t > &)
 Reassign the integer values to the strings.
void swap (dictionary &)
 Swap the content of two dictionaries.
int write (const char *name) const
 Write the content of the dictionary to the named file.

Protected Member Functions

int readKeys (const char *, FILE *)
 Read the ordered strings.
int readRaw (const char *, FILE *)
 Read the raw strings.

Protected Attributes

array_t< char * > buffer_
 Member varaible buffer_ contains a list of pointers to the memory that holds the strings.
array_t< uint32_t > code_
 Member variable code_ contains the integer code for each string in key_.
array_t< const char * > key_
 Member variable key_ contains the string values in alphabetic order.
array_t< const char * > raw_
 Member variable raw_ contains the string values in the order of the code assignment.

Detailed Description

Provide a dual-directional mapping between strings and integers.

A utility class used by ibis::category. Both the NULL string and the empty string are mapped to 0.

Note:
If FASTBIT_CS_PATTERN_MATCH is defined to be 0, the values tored in a dictionary will be folded to the upper case. This will allow the words in the dictionary to be stored in a simple sorted order. By default, the dictionary is case sensitive.

Member Function Documentation

bool ibis::dictionary::equal_to ( const ibis::dictionary other) const

Compare whether this dicrionary and the other are equal in content.

The two dictionaries are considered same only if they have the same keys and the the same integer representations.

References code_, key_, and ibis::array_t< T >::size().

Referenced by ibis::bord::bord().

const char * ibis::dictionary::find ( const char *  str) const [inline]

Find the given string in the dictionary.

If the input string is found in the dictionary, it returns the string. Otherwise it returns null pointer. This function makes a little easier to determine whether a string is in a dictionary.

uint32_t ibis::dictionary::insert ( const char *  str)

Insert a string to the dictionary.

Returns the integer value assigned to the string. A copy of the string is stored internally.

References ibis::util::copy(), ibis::gVerbose, and ibis::util::strnewdup().

Referenced by ibis::category::category(), and ibis::column::string2int().

uint32_t ibis::dictionary::insertRaw ( char *  str)

Non-copying insert.

Do not make a copy of the input string. Transfers the ownership of str to the dictionary. Caller needs to check whether it is a new word in the dictionary. If it is not a new word in the dictionary, the dictionary does not take ownership of the string argument.

References ibis::gVerbose.

int ibis::dictionary::merge ( const dictionary rhs)

Merge the incoming dictionary with this one.

It produces a dictionary that combines the words in both dictionaries and keep the words in ascending order.

Upon successful completion of this function, the return value will be the new size of the dictionary, i.e., the number of non-empty words. It returns a negative value to indicate error.

References ibis::gVerbose, key_, ibis::array_t< T >::push_back(), ibis::array_t< T >::reserve(), ibis::array_t< T >::size(), ibis::util::strnewdup(), and ibis::array_t< T >::swap().

Referenced by ibis::mensa::combineCategories().

int ibis::dictionary::morph ( const dictionary old,
ibis::array_t< uint32_t > &  o2n 
) const

Produce an array that mapps the integers in old dictionary to the new one.

The incoming dictionary represents the old dictionary, this dictionary represents the new one.

Upon successful completion of this fuction, the array o2n will have (old.size()+1) number of elements, where the new value for the old code i is stored as o2n[i].

References code_, ibis::gVerbose, key_, ibis::array_t< T >::resize(), and ibis::array_t< T >::size().

Referenced by ibis::category::setDictionary().

const char * ibis::dictionary::operator[] ( uint32_t  i) const [inline]

Return a string corresponding to the integer.

If the index is beyond the valid range, i.e., i > size(), then a null pointer will be returned.

uint32_t ibis::dictionary::operator[] ( const char *  str) const

Convert a string to its integer code.

Returns 0 for empty (null) strings, 1:size() for strings in the dictionary, and dictionary::size()+1 for unknown values.

References ibis::gVerbose.

void ibis::dictionary::patternSearch ( const char *  pat,
array_t< uint32_t > &  matches 
) const

Find all codes that matches the SQL LIKE pattern.

If the pattern is null or empty, matches is not changed.

References ibis::gVerbose, ibis::array_t< T >::push_back(), and ibis::util::strMatch().

int ibis::dictionary::read ( const char *  name)

Read the content of the named file.

The file content is read into the buffer in one-shot and then digested.

References ibis::gVerbose.

int ibis::dictionary::readKeys ( const char *  evt,
FILE *  fptr 
) [protected]

Read the ordered strings.

This function process the data produced by the write function. On successful completion, it returns 0.

References ibis::util::clear(), and ibis::gVerbose.

int ibis::dictionary::readRaw ( const char *  evt,
FILE *  fptr 
) [protected]

Read the raw strings.

This is the older style dictionary that contains the raw strings. On successful completion, this function returns 1.

References ibis::util::clear(), ibis::gVerbose, and ibis::util::sortStrings().

void ibis::dictionary::sort ( ibis::array_t< uint32_t > &  o2n)

Reassign the integer values to the strings.

Upon successful completion of this function, the integer values assigned to the strings will be in ascending order. In other word, string values that are lexigraphically smaller will have smaller integer representations.

The argument to this function carrys the permutation information needed to turn the previous integer assignments into the new ones. If the previous assignment was k, the new assignement will be o2n[k]. Note that the name o2n is shorthand for old-to-new.

References ibis::array_t< T >::resize().

int ibis::dictionary::write ( const char *  name) const

Write the content of the dictionary to the named file.

The existing content in the named file is overwritten. The content of the dictionary file is as follows.

  • Signature "#IBIS Dictionary " and version number (currently 0). (20 bytes)
  • N = Number of strings in the file. (4 bytes)
  • uint32_t[N]: the integer values assigned to the strings.
  • uint32_t[N+1]: the starting positions of the strings in this file.
  • the string values one after the other with nil terminators.

References ibis::gVerbose.

Referenced by ibis::category::category().


Member Data Documentation

array_t<char*> ibis::dictionary::buffer_ [protected]

Member varaible buffer_ contains a list of pointers to the memory that holds the strings.

Referenced by dictionary(), and swap().

array_t<uint32_t> ibis::dictionary::code_ [protected]

Member variable code_ contains the integer code for each string in key_.

Referenced by dictionary(), equal_to(), morph(), and swap().

array_t<const char*> ibis::dictionary::raw_ [protected]

Member variable raw_ contains the string values in the order of the code assignment.

Referenced by dictionary(), and swap().


The documentation for this class was generated from the following files:

Make It A Bit Faster
Contact us
Disclaimers
FastBit source code
FastBit mailing list archive