info.aduna.language.identification.ngram
Class NGramLanguageIdentifier

java.lang.Object
  extended by info.aduna.language.identification.ngram.NGramLanguageIdentifier
All Implemented Interfaces:
LanguageIdentifier

public class NGramLanguageIdentifier
extends Object
implements LanguageIdentifier

An NGramLanguageIdentifier used n-gram-based heuristics to determine the language of a given text, based on the n-gram profiles of a large collection of texts.


Constructor Summary
NGramLanguageIdentifier()
          Constructs a NGramLanguageIdentifier.
 
Method Summary
 String identify(String text)
          Identifies the language of the supplied text.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

NGramLanguageIdentifier

public NGramLanguageIdentifier()
Constructs a NGramLanguageIdentifier. Each NGramLanguageIdentifier has its own set of LanguageProfiles. As this identifier is stateless and these profiles consume a non-trivial amount of memory, it is generally a good idea to share these identifier instances.

Method Detail

identify

public String identify(String text)
Description copied from interface: LanguageIdentifier
Identifies the language of the supplied text.

Specified by:
identify in interface LanguageIdentifier
Parameters:
text - The text whose language is to be determined.
Returns:
The determined language, encoded as a 2-letter ISO 639 code, or null when the language could not be determined.


Copyright © 1997-2008 Aduna. All Rights Reserved.