Provide a dual-directional mapping between strings and integers. More...
#include <dictionary.h>
Public Member Functions | |
void | clear () |
Clear the allocated memory. Leave only the NULL entry. | |
void | copy (const dictionary &rhs) |
Copy function. Use copy constructor and swap the content. | |
dictionary (const dictionary &dic) | |
Copy constructor. Places all the string in one contiguous buffer. | |
dictionary () | |
Default constructor. Generates one (NULL) entry. | |
bool | equal_to (const ibis::dictionary &) const |
Compare whether this dicrionary and the other are equal in content. | |
const char * | find (const char *str) const |
Find the given string in the dictionary. | |
uint32_t | insert (const char *str) |
Insert a string to the dictionary. | |
uint32_t | insertRaw (char *str) |
Non-copying insert. | |
int | merge (const dictionary &) |
Merge the incoming dictionary with this one. | |
int | morph (const dictionary &, array_t< uint32_t > &) const |
Produce an array that mapps the integers in old dictionary to the new one. | |
const char * | operator[] (uint32_t i) const |
Return a string corresponding to the integer. | |
uint32_t | operator[] (const char *str) const |
Convert a string to its integer code. | |
void | patternSearch (const char *pat, array_t< uint32_t > &matches) const |
Find all codes that matches the SQL LIKE pattern. | |
int | read (const char *name) |
Read the content of the named file. | |
uint32_t | size () const |
Return the number of valid (not null) strings in the dictionary. | |
void | sort (array_t< uint32_t > &) |
Reassign the integer values to the strings. | |
void | swap (dictionary &) |
Swap the content of two dictionaries. | |
int | write (const char *name) const |
Write the content of the dictionary to the named file. | |
Protected Member Functions | |
int | readKeys (const char *, FILE *) |
Read the ordered strings. | |
int | readRaw (const char *, FILE *) |
Read the raw strings. | |
Protected Attributes | |
array_t< char * > | buffer_ |
Member varaible buffer_ contains a list of pointers to the memory that holds the strings. | |
array_t< uint32_t > | code_ |
Member variable code_ contains the integer code for each string in key_. | |
array_t< const char * > | key_ |
Member variable key_ contains the string values in alphabetic order. | |
array_t< const char * > | raw_ |
Member variable raw_ contains the string values in the order of the code assignment. |
Provide a dual-directional mapping between strings and integers.
A utility class used by ibis::category. Both the NULL string and the empty string are mapped to 0.
bool ibis::dictionary::equal_to | ( | const ibis::dictionary & | other | ) | const |
Compare whether this dicrionary and the other are equal in content.
The two dictionaries are considered same only if they have the same keys and the the same integer representations.
References code_, key_, and ibis::array_t< T >::size().
Referenced by ibis::bord::bord().
const char * ibis::dictionary::find | ( | const char * | str | ) | const [inline] |
Find the given string in the dictionary.
If the input string is found in the dictionary, it returns the string. Otherwise it returns null pointer. This function makes a little easier to determine whether a string is in a dictionary.
uint32_t ibis::dictionary::insert | ( | const char * | str | ) |
Insert a string to the dictionary.
Returns the integer value assigned to the string. A copy of the string is stored internally.
References ibis::util::copy(), ibis::gVerbose, and ibis::util::strnewdup().
Referenced by ibis::category::category(), and ibis::column::string2int().
uint32_t ibis::dictionary::insertRaw | ( | char * | str | ) |
Non-copying insert.
Do not make a copy of the input string. Transfers the ownership of str
to the dictionary. Caller needs to check whether it is a new word in the dictionary. If it is not a new word in the dictionary, the dictionary does not take ownership of the string argument.
References ibis::gVerbose.
int ibis::dictionary::merge | ( | const dictionary & | rhs | ) |
Merge the incoming dictionary with this one.
It produces a dictionary that combines the words in both dictionaries and keep the words in ascending order.
Upon successful completion of this function, the return value will be the new size of the dictionary, i.e., the number of non-empty words. It returns a negative value to indicate error.
References ibis::gVerbose, key_, ibis::array_t< T >::push_back(), ibis::array_t< T >::reserve(), ibis::array_t< T >::size(), ibis::util::strnewdup(), and ibis::array_t< T >::swap().
Referenced by ibis::mensa::combineCategories().
int ibis::dictionary::morph | ( | const dictionary & | old, |
ibis::array_t< uint32_t > & | o2n | ||
) | const |
Produce an array that mapps the integers in old dictionary to the new one.
The incoming dictionary represents the old dictionary, this dictionary represents the new one.
Upon successful completion of this fuction, the array o2n will have (old.size()+1) number of elements, where the new value for the old code i is stored as o2n[i].
References code_, ibis::gVerbose, key_, ibis::array_t< T >::resize(), and ibis::array_t< T >::size().
Referenced by ibis::category::setDictionary().
const char * ibis::dictionary::operator[] | ( | uint32_t | i | ) | const [inline] |
Return a string corresponding to the integer.
If the index is beyond the valid range, i.e., i > size(), then a null pointer will be returned.
uint32_t ibis::dictionary::operator[] | ( | const char * | str | ) | const |
Convert a string to its integer code.
Returns 0 for empty (null) strings, 1:size() for strings in the dictionary, and dictionary::size()+1 for unknown values.
References ibis::gVerbose.
void ibis::dictionary::patternSearch | ( | const char * | pat, |
array_t< uint32_t > & | matches | ||
) | const |
Find all codes that matches the SQL LIKE pattern.
If the pattern is null or empty, matches is not changed.
References ibis::gVerbose, ibis::array_t< T >::push_back(), and ibis::util::strMatch().
int ibis::dictionary::read | ( | const char * | name | ) |
Read the content of the named file.
The file content is read into the buffer in one-shot and then digested.
References ibis::gVerbose.
int ibis::dictionary::readKeys | ( | const char * | evt, |
FILE * | fptr | ||
) | [protected] |
Read the ordered strings.
This function process the data produced by the write function. On successful completion, it returns 0.
References ibis::util::clear(), and ibis::gVerbose.
int ibis::dictionary::readRaw | ( | const char * | evt, |
FILE * | fptr | ||
) | [protected] |
Read the raw strings.
This is the older style dictionary that contains the raw strings. On successful completion, this function returns 1.
References ibis::util::clear(), ibis::gVerbose, and ibis::util::sortStrings().
void ibis::dictionary::sort | ( | ibis::array_t< uint32_t > & | o2n | ) |
Reassign the integer values to the strings.
Upon successful completion of this function, the integer values assigned to the strings will be in ascending order. In other word, string values that are lexigraphically smaller will have smaller integer representations.
The argument to this function carrys the permutation information needed to turn the previous integer assignments into the new ones. If the previous assignment was k, the new assignement will be o2n[k]. Note that the name o2n is shorthand for old-to-new.
References ibis::array_t< T >::resize().
int ibis::dictionary::write | ( | const char * | name | ) | const |
Write the content of the dictionary to the named file.
The existing content in the named file is overwritten. The content of the dictionary file is as follows.
References ibis::gVerbose.
Referenced by ibis::category::category().
array_t<char*> ibis::dictionary::buffer_ [protected] |
Member varaible buffer_ contains a list of pointers to the memory that holds the strings.
Referenced by dictionary(), and swap().
array_t<uint32_t> ibis::dictionary::code_ [protected] |
Member variable code_ contains the integer code for each string in key_.
Referenced by dictionary(), equal_to(), morph(), and swap().
array_t<const char*> ibis::dictionary::raw_ [protected] |
Member variable raw_ contains the string values in the order of the code assignment.
Referenced by dictionary(), and swap().
![]() |