Package org.mozilla.universalchardet
Class UniversalDetector
- java.lang.Object
-
- org.mozilla.universalchardet.UniversalDetector
-
public class UniversalDetector extends java.lang.Object
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
UniversalDetector.InputState
-
Field Summary
Fields Modifier and Type Field Description private java.lang.String
detectedCharset
private boolean
done
private CharsetProber
escCharsetProber
private boolean
gotData
private UniversalDetector.InputState
inputState
private byte
lastChar
private CharsetListener
listener
static float
MINIMUM_THRESHOLD
private boolean
onlyPrintableASCII
private CharsetProber[]
probers
static float
SHORTCUT_THRESHOLD
private boolean
start
-
Constructor Summary
Constructors Constructor Description UniversalDetector()
UniversalDetector(CharsetListener listener)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
dataEnd()
Marks end of data reading.static java.lang.String
detectCharset(java.io.File file)
Gets the charset of a File.static java.lang.String
detectCharset(java.io.InputStream inputStream)
Gets the charset of content from InputStream.static java.lang.String
detectCharset(java.nio.file.Path path)
Gets the charset of a Path.static java.lang.String
detectCharsetFromBOM(byte[] buf)
private static java.lang.String
detectCharsetFromBOM(byte[] buf, int offset)
java.lang.String
getDetectedCharset()
CharsetListener
getListener()
void
handleData(byte[] buf)
Feed the detector with more datavoid
handleData(byte[] buf, int offset, int length)
Feed the detector with more databoolean
isDone()
void
reset()
Resets detector to be used again.void
setListener(CharsetListener listener)
-
-
-
Field Detail
-
SHORTCUT_THRESHOLD
public static final float SHORTCUT_THRESHOLD
- See Also:
- Constant Field Values
-
MINIMUM_THRESHOLD
public static final float MINIMUM_THRESHOLD
- See Also:
- Constant Field Values
-
inputState
private UniversalDetector.InputState inputState
-
done
private boolean done
-
start
private boolean start
-
gotData
private boolean gotData
-
onlyPrintableASCII
private boolean onlyPrintableASCII
-
lastChar
private byte lastChar
-
detectedCharset
private java.lang.String detectedCharset
-
probers
private CharsetProber[] probers
-
escCharsetProber
private CharsetProber escCharsetProber
-
listener
private CharsetListener listener
-
-
Constructor Detail
-
UniversalDetector
public UniversalDetector()
-
UniversalDetector
public UniversalDetector(CharsetListener listener)
- Parameters:
listener
- a listener object that is notified of the detected encocoding. Can be null.
-
-
Method Detail
-
isDone
public boolean isDone()
-
getDetectedCharset
public java.lang.String getDetectedCharset()
- Returns:
- The detected encoding is returned. If the detector couldn't determine what encoding was used, null is returned.
-
setListener
public void setListener(CharsetListener listener)
-
getListener
public CharsetListener getListener()
-
handleData
public void handleData(byte[] buf)
Feed the detector with more data- Parameters:
buf
- The buffer containing the data
-
handleData
public void handleData(byte[] buf, int offset, int length)
Feed the detector with more data- Parameters:
buf
- Buffer with the dataoffset
- initial position of data in buflength
- length of data
-
detectCharsetFromBOM
public static java.lang.String detectCharsetFromBOM(byte[] buf)
-
detectCharsetFromBOM
private static java.lang.String detectCharsetFromBOM(byte[] buf, int offset)
-
dataEnd
public void dataEnd()
Marks end of data reading. Finish calculations.
-
reset
public final void reset()
Resets detector to be used again.
-
detectCharset
public static java.lang.String detectCharset(java.io.File file) throws java.io.IOException
Gets the charset of a File.- Parameters:
file
- The file to check charset for- Returns:
- The charset of the file, null if cannot be determined
- Throws:
java.io.IOException
- if some IO error occurs
-
detectCharset
public static java.lang.String detectCharset(java.nio.file.Path path) throws java.io.IOException
Gets the charset of a Path.- Parameters:
path
- The path to file to check charset for- Returns:
- The charset of the file, null if cannot be determined
- Throws:
java.io.IOException
- if some IO error occurs
-
detectCharset
public static java.lang.String detectCharset(java.io.InputStream inputStream) throws java.io.IOException
Gets the charset of content from InputStream.- Parameters:
inputStream
- InputStream containing text file- Returns:
- The charset of the file, null if cannot be determined
- Throws:
java.io.IOException
- if some IO error occurs
-
-