hibernatingworkert wrote:
As I said in your previous thread, you have to match the analyzers both for indexing and searching. Choosing which analyzer to use for searching depends on how your WordSpliiterTokenizer was implemented.
Do you really have to tokenize your database field this way, using commas? Why not use SimpleAnalyzer, for example, to analyze your fields? This way you would be able to query for "blah blah United States blah blah ".
thanks for ur reply
yes i have to use commas in database field..
If i use Simple Analyzer the lucene will index United and States as separate which i don't want
I will get results for both "blah blah United States blah blah " and also for "blah blah States blah blah "
I have to get results for only "blah blah United States blah blah "
My Analyzer class is
public class WordSplitterAnalyzer extends Analyzer {
public TokenStream tokenStream(String fieldName, Reader reader) {
return new WordSplitterTokenizer(reader);
}
@Override
public TokenStream reusableTokenStream(String fieldName, Reader reader)
throws IOException {
Tokenizer tokenizer = (Tokenizer) getPreviousTokenStream();
if (tokenizer == null) {
tokenizer = new WordSplitterTokenizer(reader);
setPreviousTokenStream(tokenizer);
} else {
tokenizer.reset(reader);
}
return tokenizer;
}
}
and the tokenizer class is
public class WordSplitterTokenizer extends CharTokenizer {
protected static final char[] DEFAULT_WORD_SPLITTER = new char[] {','};
private char[] wordSplitter = DEFAULT_WORD_SPLITTER;
public WordSplitterTokenizer(Reader in) {
super(in);
}
protected boolean isTokenChar(char c) {
for(char ws : wordSplitter) {
return ws != c;
}
return true;
}
}
in order to get results for text containing only United States what anayzers i need to use...