de.l3s.boilerpipe.extractors
Class CommonExtractors

java.lang.Object
  extended by de.l3s.boilerpipe.extractors.CommonExtractors

public final class CommonExtractors
extends java.lang.Object

Provides quick access to common BoilerpipeExtractors.

Author:
Christian Kohlsch??tter

Field Summary
static ArticleExtractor ARTICLE_EXTRACTOR
          Works very well for most types of Article-like HTML.
static CanolaExtractor CANOLA_EXTRACTOR
          Trained on krdwrd Canola (different definition of "boilerplate").
static DefaultExtractor DEFAULT_EXTRACTOR
          Usually worse than ArticleExtractor, but simpler/no heuristics.
static KeepEverythingExtractor KEEP_EVERYTHING_EXTRACTOR
          Dummy Extractor; should return the input text.
static LargestContentExtractor LARGEST_CONTENT_EXTRACTOR
          Like DefaultExtractor, but keeps the largest text block only.
 
Method Summary
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ARTICLE_EXTRACTOR

public static final ArticleExtractor ARTICLE_EXTRACTOR
Works very well for most types of Article-like HTML.


DEFAULT_EXTRACTOR

public static final DefaultExtractor DEFAULT_EXTRACTOR
Usually worse than ArticleExtractor, but simpler/no heuristics.


LARGEST_CONTENT_EXTRACTOR

public static final LargestContentExtractor LARGEST_CONTENT_EXTRACTOR
Like DefaultExtractor, but keeps the largest text block only.


CANOLA_EXTRACTOR

public static final CanolaExtractor CANOLA_EXTRACTOR
Trained on krdwrd Canola (different definition of "boilerplate"). You may give it a try.


KEEP_EVERYTHING_EXTRACTOR

public static final KeepEverythingExtractor KEEP_EVERYTHING_EXTRACTOR
Dummy Extractor; should return the input text. Use this to double-check that your problem is within a particular BoilerpipeExtractor, or somewhere else.