ibis::text Class Reference

A minimalistic structure for storing arbitrary text fields. More...

#include <category.h>

Inheritance diagram for ibis::text:

ibis::column ibis::category

List of all members.

Public Member Functions

virtual long append (const char *dt, const char *df, const uint32_t nold, const uint32_t nnew, const uint32_t nbuf, char *buf)
 Append the data file stored in directory df to the corresponding data file in directory dt.
virtual double estimateCost (const ibis::qDiscreteRange &cmp) const
virtual double estimateCost (const ibis::qContinuousRange &cmp) const
 Estimate the cost of evaluate the query expression.
virtual double estimateCost (const ibis::qMultiString &cmp) const
virtual double estimateCost (const ibis::qString &cmp) const
virtual const char * findString (const char *str) const
 If the input string is found in the data file, it is returned, else this function returns 0.
virtual void getString (uint32_t i, std::string &val) const
 Return the string value for the ith row.
const columnIDColumnForKeywordIndex () const
virtual long keywordSearch (const char *str) const
virtual long keywordSearch (const char *str, ibis::bitvector &hits) const
virtual void print (std::ostream &out) const
virtual long search (const std::vector< std::string > &strs) const
virtual long search (const char *str) const
virtual long search (const std::vector< std::string > &strs, ibis::bitvector &hits) const
 Given a group of string literals, return a bitvector that matches anyone of the input strings.
virtual long search (const char *str, ibis::bitvector &hits) const
 Given a string literal, return a bitvector that marks the strings that matches it.
virtual std::vector
< std::string > * 
selectStrings (const bitvector &mask) const
virtual array_t< uint32_t > * selectUInts (const bitvector &mask) const
 Return the integer values of the records marked 1 in the mask.
 text (const ibis::column &col)
 text (const part *tbl, const char *name, ibis::TYPE_T t=ibis::TEXT)
 text (const part *tbl, FILE *file)
virtual void write (FILE *file) const
 Write the current content to the TDC file.

Protected Member Functions

int readString (std::string &, int, long, long, char *, uint32_t, uint32_t &, off_t &) const
 Read one string from an open file.
void readString (uint32_t i, std::string &val) const
 Read the string value of ith row.
void startPositions (const char *dir, char *buf, uint32_t nbuf) const
 Locate the starting position of each string and write the positions as unsigned integers to a file with .sp as extension.


Detailed Description

A minimalistic structure for storing arbitrary text fields.

The keyword search operation is implemented through a boolean term-document matrix (ibis::keywords) that is actually generated externally.


Member Function Documentation

long ibis::text::append ( const char *  dt,
const char *  df,
const uint32_t  nold,
const uint32_t  nnew,
const uint32_t  nbuf,
char *  buf 
) [virtual]

Append the data file stored in directory df to the corresponding data file in directory dt.

Use the buffer buf to copy data in large chuncks.

Note:
No error checking is performed.

Does not check for missing entries. May cuase records to be misaligned.

Reimplemented from ibis::column.

Reimplemented in ibis::category.

References ibis::gVerbose, and startPositions().

const char * ibis::text::findString ( const char *  str  )  const [virtual]

If the input string is found in the data file, it is returned, else this function returns 0.

It needs to keep both the data file and the starting position file open at the same time.

Reimplemented from ibis::column.

References ibis::util::buffer< T >::address(), ibis::part::currentDataDir(), ibis::gVerbose, ibis::fileManager::instance(), ibis::part::name(), ibis::part::nRows(), ibis::fileManager::recordPages(), ibis::util::buffer< T >::size(), and startPositions().

virtual void ibis::text::getString ( uint32_t  i,
std::string &  val 
) const [inline, virtual]

Return the string value for the ith row.

Only valid for ibis::text and ibis::category. ibis::text

Reimplemented from ibis::column.

Reimplemented in ibis::category.

References readString().

int ibis::text::readString ( std::string &  res,
int  fdes,
long  be,
long  en,
char *  buf,
uint32_t  nbuf,
uint32_t &  inbuf,
off_t &  boffset 
) const [protected]

Read one string from an open file.

The string starts at position be and ends at en. The content may be in the array buf.

References ibis::gVerbose.

void ibis::text::readString ( uint32_t  i,
std::string &  ret 
) const [protected]

Read the string value of ith row.

It goes through a two-stage process by reading from two files, first from the .sp file to read the position of the string in the second file and the second file contains the actual string values (with nil terminators).

This can be quite slow!

References ibis::part::currentDataDir(), ibis::fileManager::instance(), ibis::part::nRows(), and ibis::fileManager::recordPages().

Referenced by getString().

long ibis::text::search ( const std::vector< std::string > &  strs,
ibis::bitvector hits 
) const [virtual]

long ibis::text::search ( const char *  str,
ibis::bitvector hits 
) const [virtual]

array_t< uint32_t > * ibis::text::selectUInts ( const bitvector mask  )  const [virtual]

Return the integer values of the records marked 1 in the mask.

Return the positions of the bits that are marked 1.

This indicates to ibis::bundle that every string value is distinct. It also forces the sorting procedure to produce an order following the order of the entries in the table. This makes the print out of an ibis::text field quite less useful than others!

Reimplemented from ibis::column.

Reimplemented in ibis::category.

References ibis::bitvector::firstIndexSet(), and array_t< T >::push_back().

void ibis::text::startPositions ( const char *  dir,
char *  buf,
uint32_t  nbuf 
) const [protected]

Locate the starting position of each string and write the positions as unsigned integers to a file with .sp as extension.

Using the data file located in the named directory dir.

If dir is a nil pointer, the directory defaults to the current working directory of the data partition.

Argument buf (with nbuf bytes) is used as temporary work space. If nbuf = 0, this function allocates its own working space.

References ibis::util::buffer< T >::address(), ibis::part::currentDataDir(), ibis::gVerbose, ibis::part::nRows(), and ibis::util::buffer< T >::size().

Referenced by append(), findString(), and search().


The documentation for this class was generated from the following files:
Make It A Bit Faster
Disclaimers
FastBit source code
FastBit mailing list archive
Maintainer of this page