org.apache.lucene.benchmark.byTask.feeds
Class TrecDocMaker
java.lang.Object
org.apache.lucene.benchmark.byTask.feeds.BasicDocMaker
org.apache.lucene.benchmark.byTask.feeds.TrecDocMaker
- All Implemented Interfaces:
- DocMaker
public class TrecDocMaker
- extends BasicDocMaker
A DocMaker using the (compressed) Trec collection for its input.
Config properties:
- work.dir=<path to the root of docs and indexes dirs| Default: work>
- docs.dir=<path to the docs dir| Default: trec>
Fields inherited from class org.apache.lucene.benchmark.byTask.feeds.BasicDocMaker |
BODY_FIELD, BYTES_FIELD, config, DATE_FIELD, forever, ID_FIELD, indexVal, NAME_FIELD, storeVal, termVecVal, TITLE_FIELD |
Method Summary |
protected void |
closeInputs()
|
protected java.text.DateFormat |
getDateFormat(int n)
|
protected DocData |
getNextDocData()
Return the data of the next document. |
int |
numUniqueTexts()
Return how many real unique texts are available, 0 if not applicable. |
protected void |
openNextFile()
|
protected java.util.Date |
parseDate(java.lang.String dateStr)
|
protected java.lang.StringBuffer |
read(java.lang.String prefix,
java.lang.StringBuffer sb,
boolean collectMatchLine,
boolean collectAll)
|
void |
resetInputs()
Reset inputs so that the test run would behave, input wise, as if it just started. |
void |
setConfig(Config config)
Set the properties |
Methods inherited from class org.apache.lucene.benchmark.byTask.feeds.BasicDocMaker |
addBytes, addUniqueBytes, collectFiles, getByteCount, getCount, getHtmlParser, makeDocument, makeDocument, numUniqueBytes, printDocStatistics, resetUniqueBytes, setHTMLParser |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
dateFormat
protected java.lang.ThreadLocal dateFormat
dataDir
protected java.io.File dataDir
inputFiles
protected java.util.ArrayList inputFiles
nextFile
protected int nextFile
iteration
protected int iteration
reader
protected java.io.BufferedReader reader
TrecDocMaker
public TrecDocMaker()
setConfig
public void setConfig(Config config)
- Description copied from interface:
DocMaker
- Set the properties
- Specified by:
setConfig
in interface DocMaker
- Overrides:
setConfig
in class BasicDocMaker
openNextFile
protected void openNextFile()
throws NoMoreDataException,
java.lang.Exception
- Throws:
NoMoreDataException
java.lang.Exception
closeInputs
protected void closeInputs()
read
protected java.lang.StringBuffer read(java.lang.String prefix,
java.lang.StringBuffer sb,
boolean collectMatchLine,
boolean collectAll)
throws java.lang.Exception
- Throws:
java.lang.Exception
getNextDocData
protected DocData getNextDocData()
throws NoMoreDataException,
java.lang.Exception
- Description copied from class:
BasicDocMaker
- Return the data of the next document.
All current implementations can create docs forever.
When the input data is exhausted, input files are iterated.
This re-iteration can be avoided by setting doc.maker.forever to false (default is true).
- Specified by:
getNextDocData
in class BasicDocMaker
- Returns:
- data of the next document.
- Throws:
NoMoreDataException
- if data is exhausted (and 'forever' set to false).
java.lang.Exception
getDateFormat
protected java.text.DateFormat getDateFormat(int n)
parseDate
protected java.util.Date parseDate(java.lang.String dateStr)
resetInputs
public void resetInputs()
- Description copied from interface:
DocMaker
- Reset inputs so that the test run would behave, input wise, as if it just started.
- Specified by:
resetInputs
in interface DocMaker
- Overrides:
resetInputs
in class BasicDocMaker
numUniqueTexts
public int numUniqueTexts()
- Description copied from interface:
DocMaker
- Return how many real unique texts are available, 0 if not applicable.
Copyright © 2000-2011 Apache Software Foundation. All Rights Reserved.