Classes | Public Types | Public Member Functions | Static Public Member Functions | Protected Member Functions | Friends
ibis::bitvector Class Reference

A data structure to represent a sequence of bits. More...

#include <bitvector.h>

List of all members.

Classes

struct  active_word
 The struct active_word stores the last few bits that do not fill a whole word.
class  const_iterator
 The const_iterator class. It iterates on the individual bits. More...
class  indexSet
 The indexSet stores positions of bits that are one. More...
class  iterator
 The iterator that allows modification of bits. More...
struct  run
 An internal struct used during logical operations to track the usage of fill words.

Public Types

typedef uint32_t word_t
 The basic unit of data storage.

Public Member Functions

void adjustSize (word_t nv, word_t nt)
 Adjust the size of the bit sequence.
void appendFill (int val, word_t n)
 Append n bits of val.
void appendWord (word_t w)
 Append a WAH word.
iterator begin ()
const_iterator begin () const
 bitvector ()
 Default constructor. Creates a new empty bitvector.
 bitvector (const array_t< word_t > &arr)
 Construct a bitvector from an array.
 bitvector (const char *file)
 Constructor. Reconstruct a bitvector from a file.
 bitvector (const bitvector &bv)
 Copy constructor.
uint32_t bytes () const throw ()
 Return the number of bytes used by the bitvector object in memory.
void clear ()
 Remove the existing content of a bitvector.
word_t cnt () const
 Return the number of bits that are one.
void compress ()
 Merge fills into fill words.
word_t compressible () const
 Return the number of word saved if the function compress is called.
bitvectorcopy (const bitvector &bv)
 Performs deep copy.
word_t count (const bitvector &mask) const
 Count the number of bits that are 1 that also marked 1.
void decompress ()
 Turn all fill words into literal words.
bool empty () const
 Is the bitvector empty? For efficiency concerns, this funciton only works correctly on a properly compressed bitvector.
iterator end ()
const_iterator end () const
void erase (word_t i, word_t j)
 Remove the bits in the range of [i, j).
indexSet firstIndexSet () const
void flip ()
 Complement all bits of a bit sequence.
uint32_t getSerialSize () const throw ()
 Compute the number of bytes in the serialized version of this bitvector object.
word_t numFillWords () const
 Return the number of fill words.
bitvectoroperator& (const bitvector &) const
 Perform bitwise AND between this bitvector and rhs, return the result as a new bitvector.
void operator&= (const bitvector &rhs)
 Perform bitwise AND between this bitvector and rhs.
bitvectoroperator+= (const bitvector &bv)
 Append a bitvector.
bitvectoroperator+= (int b)
 Append a single bit.
bitvectoroperator- (const bitvector &) const
 Perform bitwise subtraction and return the result as a new bitvector.
void operator-= (const bitvector &rhs)
 Perform bitwise subtraction (a & !b).
const bitvectoroperator= (const bitvector &bv)
 The assignment operator.
int operator== (const bitvector &rhs) const
 Return 1 if two bit sequences have the same content, 0 otherwise.
bitvectoroperator^ (const bitvector &) const
 Perform bitwise XOR and return the result as a new bitvector.
void operator^= (const bitvector &rhs)
 Perform bitwise exclusive or (XOR).
bitvectoroperator| (const bitvector &) const
 Perform bitwise OR and return the result as a new bitvector.
void operator|= (const bitvector &rhs)
 Perform bitwise OR.
std::ostream & print (std::ostream &) const
 The print function.
void read (const char *fn)
 Read a bit vector from the file.
void reserve (unsigned nb, unsigned nc, double cf=0.0)
 Reserve enough space for a bit vector.
void set (int val, word_t n)
 Create a vector with n bits of value val (cf.
void setBit (const word_t i, int val)
 Replace a single bit at position i with val.
word_t size () const throw ()
 Return the total number of bits in the bit sequence.
void sloppySize (word_t n) const
 Explicitly set the size of the bitvector.
void subset (const bitvector &mask, bitvector &res) const
 Select a subset of the bits.
bitvectorswap (bitvector &bv)
void turnOnRawBit (const word_t i)
 Turn on a single bit in a uncompressed bitvector.
void write (array_t< word_t > &arr) const
 Write the bit vector to an array_t<word_t>.
void write (int fdes) const
 Write to a file that is opened by the caller.
void write (const char *fn) const
 Write the bit vector to a file.
 ~bitvector ()
 Destructor.

Static Public Member Functions

static word_t bitsPerLiteral ()
 Return the number of bits in a literal word.
static double clusteringFactor (word_t nb, word_t nc, word_t sz)
 Estimate clustering factor based on the size.
static double markovSize (word_t nb, word_t nc, double f)
 Compute the expected size (number of bytes) of a random sequence generated from a Markov process.
static double randomSize (word_t nb, word_t nc)
 Compute the expected number of bytes required to store a random sequence.

Protected Member Functions

bool all0s () const
 Are all bits in regular words 0?
bool all1s () const
 Are all bits in regular words 1?
bool isCompressed () const

Friends

struct active_word
class const_iterator
class indexSet
class iterator
struct run

Detailed Description

A data structure to represent a sequence of bits.

Key features

A bitvector object stores a sequence of bits and provides fast bitwise logical operations. In addition, it supports operations to append new bits from the end, read bits at arbitrary location and set bits at arbitrary location. It also supports an iterator, a const_iterator and an indexSet.

Encoding format <http://lbl.gov/~kwu/ps/LBNL-49626.html> <http://portal.acm.org/citation.cfm?doid=1132863.1132864>

Incoming bits are organized into words (bitvector::word_t). A word is a literal word if its Most Significant Bit (MSB) is 0, it is a fill word if its MSB is 1. A literal word stores literal bit values in the bit position following the MSB and a fill word stores a sequence of consecutive bits that are of the same value, i.e., a fill. The second most significant bit of the fill word is the bit value, the remaining bits of the word is a unsigned integer that stores the length of the fill as number of equivalent literal words, i.e., how many literal words it will take if the fill is stored in literal words.

Restrictions


Constructor & Destructor Documentation

ibis::bitvector::bitvector ( const bitvector bv)

Copy constructor.

The underlying storage (m_vec) is constructed through a copy constructor as well.

References ibis::gVerbose.

ibis::bitvector::bitvector ( const array_t< word_t > &  arr) [explicit]

Construct a bitvector from an array.

Because the array copy constructor performs shallow copy, this bitvector is not using any new space for the underlying vector.

References clear(), ibis::gVerbose, and ibis::array_t< T >::size().


Member Function Documentation

void ibis::bitvector::adjustSize ( word_t  nv,
word_t  nt 
)

Adjust the size of the bit sequence.

If current size is less than nv, append enough 1s so that it has nv bits. If the resulting total number of bits is less than nt, append 0 bits so that there are nt total bits. If there are more than nt bits, only the first nt bits are kept. The final result always contains nt bits.

References ibis::gVerbose.

Referenced by ibis::index::addBins(), ibis::tafel::append(), ibis::column::append(), ibis::blob::append(), ibis::category::append(), ibis::tafel::appendString(), ibis::column::appendStrings(), ibis::column::appendValues(), ibis::bin::binning(), ibis::bin::binningT(), ibis::egale::construct(), ibis::part::doCompare(), ibis::part::doCount(), ibis::part::doScan(), ibis::countQuery::estimate(), ibis::column::estimateRange(), ibis::countQuery::evaluate(), ibis::column::evaluateRange(), ibis::bord::column::evaluateRange(), ibis::part::evaluateRIDSet(), ibis::mensa::cursor::fillBuffer(), ibis::column::getNullMask(), ibis::bord::column::keywordSearch(), ibis::roster::locate(), ibis::index::mapValues(), ibis::bak2::mapValues(), ibis::bak::mapValues(), ibis::tafel::normalize(), ibis::part::numbersToBitvector(), operator&(), operator&=(), operator-(), operator-=(), operator^(), operator^=(), operator|(), operator|=(), ibis::part::part(), ibis::keywords::readTermDocFile(), ibis::column::searchSortedICC(), ibis::column::searchSortedICD(), ibis::column::searchSortedOOCC(), ibis::column::searchSortedOOCD(), sloppySize(), ibis::text::stringSearch(), ibis::bord::column::stringSearch(), ibis::column::truncateData(), ibis::part::writeColumn(), ibis::column::writeData(), ibis::blob::writeData(), ibis::part::writeRaw(), ibis::part::writeString(), and ibis::text::writeStrings().

void ibis::bitvector::appendFill ( int  val,
word_t  n 
) [inline]

Append n bits of val.

The value n may be arbitrary integer as long as the resulting size is still representable by a ibis::bitvector::word_t, however, the value val must be either 0 or 1.

Referenced by ibis::tafel::append(), ibis::tafel::appendString(), erase(), ibis::column::evaluateRange(), ibis::mensa::cursor::fillBuffer(), ibis::index::mapValues(), ibis::part::numbersToBitvector(), ibis::column::searchSortedICC(), ibis::column::searchSortedOOCC(), and subset().

void ibis::bitvector::appendWord ( word_t  w)

Append a WAH word.

The incoming argument w is assumed to be a WAH compressed word.

Referenced by erase(), and subset().

void ibis::bitvector::clear ( )

Remove the existing content of a bitvector.

The underlying storage is not released until the object is actual freed.

References ibis::gVerbose.

Referenced by ibis::bord::backup(), bitvector(), ibis::tafel::capacity(), ibis::bin::checkBin(), ibis::tafel::clearData(), ibis::part::doCompare(), ibis::query::doEstimate(), ibis::countQuery::doEstimate(), ibis::tafel::doReserve(), ibis::part::doScan(), ibis::zone::estimate(), ibis::pale::estimate(), ibis::pack::estimate(), ibis::fuge::estimate(), ibis::ambit::estimate(), ibis::slice::estimate(), ibis::relic::estimate(), ibis::range::estimate(), ibis::mesa::estimate(), ibis::direkte::estimate(), ibis::moins::estimate(), ibis::entre::estimate(), ibis::egale::estimate(), ibis::bin::estimate(), ibis::column::estimateRange(), ibis::column::evaluateRange(), ibis::part::get2DDistribution(), ibis::part::get3DDistribution(), ibis::part::getCumulativeDistribution(), ibis::query::getExpandedHits(), ibis::part::getJointDistribution(), ibis::bord::column::keywordSearch(), ibis::roster::locate(), ibis::part::negativeCompare(), ibis::part::old2DDistribution(), ibis::category::patternSearch(), ibis::keywords::search(), ibis::column::searchSortedICC(), ibis::column::searchSortedICD(), ibis::column::searchSortedOOCC(), ibis::column::searchSortedOOCD(), ibis::query::sequentialScan(), ibis::keywords::setBits(), ibis::text::stringSearch(), ibis::category::stringSearch(), ibis::bord::column::stringSearch(), ibis::part::stringToBitvector(), subset(), ibis::relic::undecidable(), ibis::keywords::undecidable(), and ibis::direkte::undecidable().

double ibis::bitvector::clusteringFactor ( word_t  nb,
word_t  nc,
word_t  sz 
) [static]

Estimate clustering factor based on the size.

The size is measured as the number of bytes.

See also:
markovSize.

References ibis::gVerbose.

Referenced by ibis::query::evaluate(), ibis::query::processJoin(), ibis::relic::speedTest(), and ibis::bin::speedTest().

void ibis::bitvector::compress ( )
ibis::bitvector::word_t ibis::bitvector::count ( const bitvector mask) const

Count the number of bits that are 1 that also marked 1.

A straightforward implement of this is to perform a bitwise AND and then count the number of bits that are 1. However, such an approach will generate a bitvector that is only used for counting. This is an attempt to do better.

References all0s(), all1s(), and ibis::array_t< T >::size().

void ibis::bitvector::decompress ( )
bool ibis::bitvector::empty ( ) const [inline]

Is the bitvector empty? For efficiency concerns, this funciton only works correctly on a properly compressed bitvector.

Referenced by ibis::category::patternSearch().

void ibis::bitvector::flip ( )
uint32_t ibis::bitvector::getSerialSize ( ) const throw () [inline]

Compute the number of bytes in the serialized version of this bitvector object.

This would be the number of bytes this bitvector needs on disk or in an array_t<word_t>.

Referenced by ibis::fuzz::getSerialSize(), and ibis::egale::getSerialSize().

double ibis::bitvector::markovSize ( word_t  nb,
word_t  nc,
double  f 
) [inline, static]

Compute the expected size (number of bytes) of a random sequence generated from a Markov process.

The bit sequence is to have nb total bits, nc bits of one, and f consecutive ones on the average. The argument f is known as the clustering factor.

ibis::bitvector * ibis::bitvector::operator& ( const bitvector rhs) const

Perform bitwise AND between this bitvector and rhs, return the result as a new bitvector.

This version of bitwise logical operator produces a new bitvector as the result and return a pointer to the new object.

Note:
The caller is responsible for deleting the bitvector returned.
See also:
ibis::bitvector::operator&=

References adjustSize(), all0s(), all1s(), copy(), ibis::array_t< T >::resize(), ibis::array_t< T >::size(), and size().

void ibis::bitvector::operator&= ( const bitvector rhs)

Perform bitwise AND between this bitvector and rhs.

The in-place version of the bitwise logical AND operator.

It performs the bitwise logical AND operation between this bitvector and rhs, then stores the result back to this bitvector.

Note:
If the two bit vectors are not of the same length, the shorter one is implicitly padded with 0 bits so the two are of the same length.

References adjustSize(), all0s(), all1s(), bytes(), copy(), ibis::gVerbose, ibis::array_t< T >::size(), and size().

ibis::bitvector * ibis::bitvector::operator- ( const bitvector rhs) const

Perform bitwise subtraction and return the result as a new bitvector.

The operands of the bitwise minus operation are not modified, instead a new bitvector object is generated.

See also:
ibis::bitvector::operator&

References adjustSize(), all0s(), all1s(), copy(), flip(), ibis::array_t< T >::resize(), ibis::array_t< T >::size(), and size().

void ibis::bitvector::operator-= ( const bitvector rhs)

Perform bitwise subtraction (a & !b).

The in-place version of the bitwise minus (-) operator.

This bitvector is modified to store the result of the operation.

See also:
ibis::bitvector::operator&=

References adjustSize(), all0s(), all1s(), bytes(), ibis::util::copy(), ibis::gVerbose, ibis::array_t< T >::size(), and size().

const ibis::bitvector & ibis::bitvector::operator= ( const bitvector bv) [inline]

The assignment operator.

Use deep copy. Wanted to use shallow copy for efficiency considerations, but SHALLOW copy causes unexpected problem in test program bitty.cpp.

ibis::bitvector * ibis::bitvector::operator^ ( const bitvector rhs) const

Perform bitwise XOR and return the result as a new bitvector.

This bitvector is not modified, instead a new bitvector object is generated to store the result.

See also:
ibis::bitvector::operator&

References adjustSize(), copy(), ibis::array_t< T >::resize(), ibis::array_t< T >::size(), and size().

void ibis::bitvector::operator^= ( const bitvector rhs)

Perform bitwise exclusive or (XOR).

The in-place version of the bitwise XOR (^) operator.

This bitvector is modified to store the result.

See also:
ibis::bitvector::operator&=

References adjustSize(), bytes(), ibis::gVerbose, ibis::array_t< T >::size(), and size().

ibis::bitvector * ibis::bitvector::operator| ( const bitvector rhs) const

Perform bitwise OR and return the result as a new bitvector.

This bitvector is not modified, instead a new bitvector is generated.

See also:
ibis::bitvector::operator&

References adjustSize(), all0s(), all1s(), copy(), ibis::array_t< T >::resize(), ibis::array_t< T >::size(), and size().

void ibis::bitvector::operator|= ( const bitvector rhs)

Perform bitwise OR.

The is the in-place version of the bitwise OR (|) operator.

This bitvector is modified to store the result of the operation.

See also:
ibis::bitvector::operator&=

References adjustSize(), all0s(), all1s(), bytes(), copy(), ibis::gVerbose, ibis::array_t< T >::size(), and size().

std::ostream & ibis::bitvector::print ( std::ostream &  o) const

The print function.

Print each word in bitvector on a line.

References ibis::gVerbose.

Referenced by ibis::query::sequentialScan().

double ibis::bitvector::randomSize ( word_t  nb,
word_t  nc 
) [inline, static]

Compute the expected number of bytes required to store a random sequence.

The random bit sequence is to have nb total bits and nc bits of one.

Referenced by ibis::query::evaluate().

void ibis::bitvector::read ( const char *  fn)

Read a bit vector from the file.

Purge current contents before read.

References ibis::fileManager::getFile(), ibis::gVerbose, and ibis::fileManager::instance().

Referenced by ibis::column::append(), ibis::blob::append(), bitvector(), ibis::part::part(), and ibis::query::readHits().

void ibis::bitvector::reserve ( unsigned  nb,
unsigned  nc,
double  cf = 0.0 
)

Reserve enough space for a bit vector.

The caller needs to specify the number of total bits, nb, and the number of set bits, nc. The caller may additional specify the clustering factor, cf, if there is a reasonable estimate for it.

References ibis::gVerbose.

Referenced by ibis::part::doCompare(), ibis::part::doScan(), ibis::part::negativeCompare(), ibis::column::searchSortedICD(), and ibis::column::searchSortedOOCD().

void ibis::bitvector::set ( int  val,
word_t  n 
)

Create a vector with n bits of value val (cf.

memset()).

Note:
val must be either 0 or 1.

Referenced by ibis::index::addBins(), ibis::index::addBits(), ibis::relic::append(), ibis::category::append(), ibis::bord::backup(), ibis::bin::binning(), ibis::bin::binningT(), ibis::bord::bord(), ibis::bundles::bundles(), ibis::egale::construct(), ibis::bin::construct(), ibis::part::doCompare(), ibis::part::doCompare0(), ibis::query::doEstimate(), ibis::countQuery::doEstimate(), ibis::query::doEvaluate(), ibis::countQuery::doEvaluate(), ibis::query::doScan(), ibis::part::doScan(), ibis::countQuery::doScan(), ibis::query::estimate(), ibis::zone::estimate(), ibis::pale::estimate(), ibis::pack::estimate(), ibis::fuge::estimate(), ibis::ambit::estimate(), ibis::slice::estimate(), ibis::range::estimate(), ibis::index::estimate(), ibis::mesa::estimate(), ibis::direkte::estimate(), ibis::moins::estimate(), ibis::entre::estimate(), ibis::egale::estimate(), ibis::bin::estimate(), ibis::part::estimateMatchAny(), ibis::part::estimateRange(), ibis::column::estimateRange(), ibis::zona::evaluate(), ibis::fuzz::evaluate(), ibis::bylt::evaluate(), ibis::slice::evaluate(), ibis::sbiad::evaluate(), ibis::sapid::evaluate(), ibis::relic::evaluate(), ibis::range::evaluate(), ibis::mesa::evaluate(), ibis::fade::evaluate(), ibis::direkte::evaluate(), ibis::bin::evaluate(), ibis::part::evaluateRange(), ibis::column::evaluateRange(), ibis::bord::evaluateTerms(), ibis::part::get1DBins(), ibis::part::get2DBins(), ibis::part::get3DBins(), ibis::part::getDistribution(), ibis::column::getNullMask(), ibis::bord::groupby(), ibis::index::initBitmaps(), ibis::roster::locate(), ibis::part::lookforString(), ibis::index::mapValues(), ibis::bak2::mapValues(), ibis::bak::mapValues(), ibis::part::matchAny(), ibis::part::negativeCompare(), ibis::part::negativeScan(), ibis::tafel::normalize(), ibis::part::part(), ibis::keywords::readTermDocFile(), ibis::query::removeComplexConditions(), ibis::keywords::search(), ibis::column::searchSortedICC(), ibis::column::searchSortedOOCC(), ibis::text::stringSearch(), ibis::category::stringSearch(), ibis::bord::column::stringSearch(), subset(), ibis::index::sumBins(), ibis::index::sumBits(), ibis::zone::undecidable(), ibis::pale::undecidable(), ibis::pack::undecidable(), ibis::ambit::undecidable(), ibis::range::undecidable(), ibis::mesa::undecidable(), ibis::egale::undecidable(), ibis::bin::undecidable(), and ibis::tafel::write().

void ibis::bitvector::setBit ( const word_t  i,
int  val 
)

Replace a single bit at position i with val.

If the value of val is not 0, it is assumed to be 1. This function can be used to extend the length of the bit sequence. When the given index (ind) is beyond the end of the current sequence, the unspecified bits in the range of [size(), ind) are assumed to be 0.

This function may uncompress the object if the bit to be changed is in the middle of a long bitvector. This is to improve the speed of operation at the cost of some additional space. The bitvector is considered long if the cube of the number of words is more than the number of bits in the bitvector.

References ibis::gVerbose.

Referenced by ibis::part::doCompare(), ibis::part::doScan(), ibis::part::evaluateRIDSet(), ibis::part::fill1DBins(), ibis::part::fill1DBinsWeighted(), ibis::part::fill2DBins(), ibis::part::fill2DBinsWeighted(), ibis::part::fill3DBins(), ibis::part::fill3DBinsWeighted(), ibis::bord::column::keywordSearch(), ibis::roster::locate(), ibis::index::mapValues(), ibis::bak2::mapValues(), ibis::bak::mapValues(), ibis::part::negativeCompare(), ibis::part::numbersToBitvector(), ibis::keywords::parseTextFile(), ibis::column::searchSortedICD(), ibis::column::searchSortedOOCD(), ibis::keywords::setBits(), ibis::text::stringSearch(), and ibis::bord::column::stringSearch().

void ibis::bitvector::sloppySize ( word_t  n) const [inline]

Explicitly set the size of the bitvector.

This is intended to be used by indexing functions to avoid counting the number of bits. Caller is responsible for ensuring the size assigned is actually correct. It does not the number of bits actually represented by this data structure. To change the number of bits represented by this data structure use the function adjustSize instead.

References adjustSize(), and ibis::gVerbose.

Referenced by ibis::index::initBitmaps().

void ibis::bitvector::subset ( const bitvector mask,
ibis::bitvector res 
) const

Select a subset of the bits.

Bits whose positions are marked 1 in mask are put together to form a new bitvector res.

References appendFill(), appendWord(), clear(), cnt(), ibis::gVerbose, set(), and size().

Referenced by ibis::column::saveSelected(), and ibis::text::writeStrings().

void ibis::bitvector::write ( array_t< word_t > &  arr) const

Write the bit vector to an array_t<word_t>.

The serialize version of the bit vector may be passed to another I/O function or sent through networks.

References ibis::array_t< T >::push_back(), ibis::array_t< T >::reserve(), and ibis::array_t< T >::resize().


The documentation for this class was generated from the following files:

Make It A Bit Faster
Contact us
Disclaimers
FastBit source code
FastBit mailing list archive