de.l3s.boilerpipe.filters.english
Class MinFulltextWordsFilter
java.lang.Object
de.l3s.boilerpipe.filters.english.MinFulltextWordsFilter
- All Implemented Interfaces:
- BoilerpipeFilter
public final class MinFulltextWordsFilter
- extends java.lang.Object
- implements BoilerpipeFilter
Keeps only those content blocks which contain at least k full-text words
(measured by HeuristicFilterBase.getNumFullTextWords(TextBlock)
). k is 30 by default.
- Author:
- Christian Kohlsch??tter
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
DEFAULT_INSTANCE
public static final MinFulltextWordsFilter DEFAULT_INSTANCE
MinFulltextWordsFilter
public MinFulltextWordsFilter(int minWords)
getDefaultInstance
public static MinFulltextWordsFilter getDefaultInstance()
process
public boolean process(TextDocument doc)
throws BoilerpipeProcessingException
- Description copied from interface:
BoilerpipeFilter
- Processes the given document
doc
.
- Specified by:
process
in interface BoilerpipeFilter
- Parameters:
doc
- The TextDocument
that is to be processed.
- Returns:
true
if changes have been made to the
TextDocument
.
- Throws:
BoilerpipeProcessingException
getNumFullTextWords
protected static int getNumFullTextWords(TextBlock tb)
getNumFullTextWords
protected static int getNumFullTextWords(TextBlock tb,
float minTextDensity)