info.aduna.infosource.crawl.index
Class CustomAnalyzer

java.lang.Object
  extended by org.apache.lucene.analysis.Analyzer
      extended by info.aduna.infosource.crawl.index.CustomAnalyzer

public class CustomAnalyzer
extends org.apache.lucene.analysis.Analyzer

A CustomAnalyzer mimics Lucene's StandardAnalyzer with no stopwords, inserts extra "subtokens" to improve searchability and keeps track of token frequencies to facilitate significant term extraction.


Nested Class Summary
static class CustomAnalyzer.TermFrequency
           
 
Constructor Summary
CustomAnalyzer()
           
 
Method Summary
 void clear()
           
 HashMap<String,CustomAnalyzer.TermFrequency> getLastTermFrequencies()
          Returns the term frequencies of the last analyzed stream and resets the internal administration of this data to null.
 org.apache.lucene.analysis.TokenStream tokenStream(String fieldName, Reader reader)
           
 
Methods inherited from class org.apache.lucene.analysis.Analyzer
getPositionIncrementGap
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CustomAnalyzer

public CustomAnalyzer()
Method Detail

clear

public void clear()

tokenStream

public org.apache.lucene.analysis.TokenStream tokenStream(String fieldName,
                                                          Reader reader)
Specified by:
tokenStream in class org.apache.lucene.analysis.Analyzer

getLastTermFrequencies

public HashMap<String,CustomAnalyzer.TermFrequency> getLastTermFrequencies()
Returns the term frequencies of the last analyzed stream and resets the internal administration of this data to null. In other words: this data can only be retrieved once, to prevent memory leaks.



Copyright © 1997-2008 Aduna. All Rights Reserved.