public class EnwikiContentSource extends ContentSource
ContentSource
which reads the English Wikipedia dump. You can read
the .bz2 file directly (it will be decompressed on the fly). Config
properties:
BUFFER_SIZE, encoding, forever, logStep, verbose
Constructor and Description |
---|
EnwikiContentSource() |
Modifier and Type | Method and Description |
---|---|
void |
close()
Called when reading from this content source is no longer required.
|
DocData |
getNextDocData(DocData docData)
Returns the next
DocData from the content source. |
void |
resetInputs()
Resets the input for this content source, so that the test would behave as
if it was just started, input-wise.
|
void |
setConfig(Config config)
Sets the
Config for this content source. |
addBytes, addDoc, collectFiles, getBytesCount, getConfig, getDocsCount, getInputStream, getTotalBytesCount, getTotalDocsCount, shouldLog
public void close() throws IOException
ContentSource
close
in class ContentSource
IOException
public DocData getNextDocData(DocData docData) throws NoMoreDataException, IOException
ContentSource
DocData
from the content source.getNextDocData
in class ContentSource
NoMoreDataException
IOException
public void resetInputs() throws IOException
ContentSource
NOTE: the default implementation resets the number of bytes and documents generated since the last reset, so it's important to call super.resetInputs in case you override this method.
resetInputs
in class ContentSource
IOException
public void setConfig(Config config)
ContentSource
Config
for this content source. If you override this
method, you must call super.setConfig.setConfig
in class ContentSource
Copyright © 2000-2012 Apache Software Foundation. All Rights Reserved.