de.l3s.boilerpipe.filters.english
Class DensityRulesClassifier
java.lang.Object
de.l3s.boilerpipe.filters.english.DensityRulesClassifier
- All Implemented Interfaces:
- BoilerpipeFilter
public class DensityRulesClassifier
- extends java.lang.Object
- implements BoilerpipeFilter
Classifies TextBlock
s as content/not-content through rules that have
been determined using the C4.8 machine learning algorithm, as described in the
paper "Boilerplate Detection using Shallow Text Features", particularly using
text densities and link densities.
- Author:
- Christian Kohlsch??tter
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
INSTANCE
public static final DensityRulesClassifier INSTANCE
DensityRulesClassifier
public DensityRulesClassifier()
getInstance
public static DensityRulesClassifier getInstance()
- Returns the singleton instance for RulebasedBoilerpipeClassifier.
process
public boolean process(TextDocument doc)
throws BoilerpipeProcessingException
- Description copied from interface:
BoilerpipeFilter
- Processes the given document
doc
.
- Specified by:
process
in interface BoilerpipeFilter
- Parameters:
doc
- The TextDocument
that is to be processed.
- Returns:
true
if changes have been made to the
TextDocument
.
- Throws:
BoilerpipeProcessingException
classify
protected boolean classify(TextBlock prev,
TextBlock curr,
TextBlock next)