info.aduna.language.identification.ngram
Class NGramLanguageIdentifier
java.lang.Object
info.aduna.language.identification.ngram.NGramLanguageIdentifier
- All Implemented Interfaces:
- LanguageIdentifier
public class NGramLanguageIdentifier
- extends Object
- implements LanguageIdentifier
An NGramLanguageIdentifier used n-gram-based heuristics to determine the
language of a given text, based on the n-gram profiles of a large collection
of texts.
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
NGramLanguageIdentifier
public NGramLanguageIdentifier()
- Constructs a NGramLanguageIdentifier. Each NGramLanguageIdentifier has
its own set of LanguageProfiles. As this identifier is stateless and
these profiles consume a non-trivial amount of memory, it is generally a
good idea to share these identifier instances.
identify
public String identify(String text)
- Description copied from interface:
LanguageIdentifier
- Identifies the language of the supplied text.
- Specified by:
identify in interface LanguageIdentifier
- Parameters:
text - The text whose language is to be determined.
- Returns:
- The determined language, encoded as a 2-letter ISO 639 code, or
null when the language could not be determined.
Copyright © 1997-2008 Aduna. All Rights Reserved.