|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.l3s.boilerpipe.extractors.ExtractorBase
public abstract class ExtractorBase
The base class of Extractors. Also provides some helper methods to quickly retrieve the text that remained after processing.
Constructor Summary | |
---|---|
ExtractorBase()
|
Method Summary | |
---|---|
java.lang.String |
getText(org.xml.sax.InputSource is)
Extracts text from the HTML code available from the given InputSource . |
java.lang.String |
getText(java.io.Reader r)
Extracts text from the HTML code available from the given Reader . |
java.lang.String |
getText(java.lang.String html)
Extracts text from the HTML code given as a String. |
java.lang.String |
getText(TextDocument doc)
Extracts text from the given TextDocument object. |
java.lang.String |
getText(java.net.URL url)
Extracts text from the HTML code available from the given URL . |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface de.l3s.boilerpipe.BoilerpipeFilter |
---|
process |
Constructor Detail |
---|
public ExtractorBase()
Method Detail |
---|
public java.lang.String getText(java.lang.String html) throws BoilerpipeProcessingException
getText
in interface BoilerpipeExtractor
html
- The HTML code as a String.
BoilerpipeProcessingException
public java.lang.String getText(org.xml.sax.InputSource is) throws BoilerpipeProcessingException
InputSource
.
getText
in interface BoilerpipeExtractor
is
- The InputSource containing the HTML
BoilerpipeProcessingException
public java.lang.String getText(java.net.URL url) throws BoilerpipeProcessingException
URL
.
NOTE: This method is mainly to be used for show case purposes. If you are
going to crawl the Web, consider using getText(InputSource)
instead.
url
- The URL pointing to the HTML code.
BoilerpipeProcessingException
public java.lang.String getText(java.io.Reader r) throws BoilerpipeProcessingException
Reader
.
getText
in interface BoilerpipeExtractor
r
- The Reader containing the HTML
BoilerpipeProcessingException
public java.lang.String getText(TextDocument doc) throws BoilerpipeProcessingException
TextDocument
object.
getText
in interface BoilerpipeExtractor
doc
- The TextDocument
.
BoilerpipeProcessingException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |