hardy.ferentschik wrote:
if you want to have a decent answer to your question you should try to explain your problem better. How does your CommaSpliterAnalyzer looks like. How do you use it? The text you are quoting. Is this your query or the text you want to index?
--Hardy
here is my CommaSplitterAnalyzer class :
public class
CommaSplitterAnalyzer extends Analyzer {
public TokenStream tokenStream(String fieldName, Reader reader) {
return new CommaSplitterTokenizer(reader);
}
WordlistLoader loader = new WordlistLoader();
@Override
public TokenStream reusableTokenStream(String fieldName, Reader reader)
throws IOException {
Tokenizer tokenizer = (Tokenizer) getPreviousTokenStream();
if (tokenizer == null) {
tokenizer = new CommaSplitterTokenizer(reader);
setPreviousTokenStream(tokenizer);
} else {
tokenizer.reset(reader);
}
return tokenizer;
}
}
and the tokenzer class is :
public class
CommaSplitterTokenizer extends CharTokenizer {
protected static final char[] DEFAULT_WORD_SPLITTER = new char[] {','};
private char[] wordSplitter = DEFAULT_WORD_SPLITTER;
public CommaSplitterTokenizer(Reader in) {
super(in);
}
protected boolean isTokenChar(char c) {
for(char ws : wordSplitter) {
return ws != c;
}
return true;
}
}
so when iam adding the tags field containing text
Micheal Jackson, Barack Obama to the database the words are split based on comma using the code
@Field(index=Index.TOKENIZED,analyzer=@Analyzer(impl=CommaSplitterAnalyzer.class), store=Store.YES)
private String tags;
and iam able to find the following tokens using luke tool
>> Micheal Jackson
>> Barack Obama
so here i have successfully split the text using comma
So my problem comes when i search the following text
"Monday afternoon on the steps of Town Hall about the controversy
surrounding President
Barack Obama back-to-school speech, framing the issue blah blah ..."
iam unable to fetch the row
Micheal Jackson, Barack Obama from database , since the standard analyzer is splitting the text into
Barack and Obama seperately and searching in the index
So there it is unable to fetch the combined
Barack ObamaIs there any Analyzer to search more than one word from index apart from using KeywordAnalyzer (since i can't use quotes for "Barack Obama" in the large text)
So to match Barack Obama(2 words) in the index what analyzer i need to use..
help is appreciated
thanks in advance..