-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 
Author Message
 Post subject: Search for more than one token from lucene index
PostPosted: Tue Sep 08, 2009 9:04 am 
Newbie

Joined: Tue Aug 04, 2009 6:46 am
Posts: 17
Hi all,

I have indexed a field Micheal Jackson, Barack Obama using a custom CommaSpliterAnalyzer

i can able to find the following tokens using luke tool

>> Micheal Jackson
>> Barack Obama

So when i search the follwing text
"Monday afternoon on the steps of Town Hall about the controversy
surrounding President Barack Obama back-to-school speech, framing the issue blah blah ..."

iam getting no results..

my knowledge is hibernate lucene can search for only single word in the index.

Is there any Analyzer to search more than one word from index apart from using KeywordAnalyzer (since i can't use quotes for "Barack Obama" in the large text)

In order to fetch Barack Obama from the text what Analyzer i must use..

Thanks in advance..


Top
 Profile  
 
 Post subject: Re: Search for more than one token from lucene index
PostPosted: Wed Sep 09, 2009 3:07 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
if you want to have a decent answer to your question you should try to explain your problem better. How does your CommaSpliterAnalyzer looks like. How do you use it? The text you are quoting. Is this your query or the text you want to index?

--Hardy


Top
 Profile  
 
 Post subject: Re: Search for more than one token from lucene index
PostPosted: Thu Sep 17, 2009 9:22 am 
Newbie

Joined: Tue Aug 04, 2009 6:46 am
Posts: 17
hardy.ferentschik wrote:
if you want to have a decent answer to your question you should try to explain your problem better. How does your CommaSpliterAnalyzer looks like. How do you use it? The text you are quoting. Is this your query or the text you want to index?
--Hardy

here is my CommaSplitterAnalyzer class :

public class CommaSplitterAnalyzer extends Analyzer {

public TokenStream tokenStream(String fieldName, Reader reader) {
return new CommaSplitterTokenizer(reader);
}
WordlistLoader loader = new WordlistLoader();

@Override
public TokenStream reusableTokenStream(String fieldName, Reader reader)
throws IOException {
Tokenizer tokenizer = (Tokenizer) getPreviousTokenStream();
if (tokenizer == null) {
tokenizer = new CommaSplitterTokenizer(reader);
setPreviousTokenStream(tokenizer);
} else {
tokenizer.reset(reader);
}

return tokenizer;
}
}


and the tokenzer class is :
public class CommaSplitterTokenizer extends CharTokenizer {
protected static final char[] DEFAULT_WORD_SPLITTER = new char[] {','};
private char[] wordSplitter = DEFAULT_WORD_SPLITTER;
public CommaSplitterTokenizer(Reader in) {
super(in);
}
protected boolean isTokenChar(char c) {
for(char ws : wordSplitter) {
return ws != c;
}
return true;
}
}

so when iam adding the tags field containing text Micheal Jackson, Barack Obama to the database the words are split based on comma using the code

@Field(index=Index.TOKENIZED,analyzer=@Analyzer(impl=CommaSplitterAnalyzer.class), store=Store.YES)
private String tags;

and iam able to find the following tokens using luke tool
>> Micheal Jackson
>> Barack Obama

so here i have successfully split the text using comma

So my problem comes when i search the following text
"Monday afternoon on the steps of Town Hall about the controversy
surrounding President Barack Obama back-to-school speech, framing the issue blah blah ..."
iam unable to fetch the row Micheal Jackson, Barack Obama from database , since the standard analyzer is splitting the text into
Barack and Obama seperately and searching in the index
So there it is unable to fetch the combined Barack Obama

Is there any Analyzer to search more than one word from index apart from using KeywordAnalyzer (since i can't use quotes for "Barack Obama" in the large text)

So to match Barack Obama(2 words) in the index what analyzer i need to use..
help is appreciated

thanks in advance..


Top
 Profile  
 
 Post subject: Re: Search for more than one token from lucene index
PostPosted: Thu Sep 17, 2009 4:02 pm 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Hi,

I just don't understand why you want to use this CommaSplitterAnalyzer at all, especially since you want to index some sort of article ("Monday afternoon on the steps of Town Hall about the controversy...").

Remember you want to use the same analyzer for indexing and searching. In your case the StandardAnalyzer should do unless you for example also want to do language specific analyzing.

I think your problem is more the actual search query. If you want to search for an exact phrase you have use quotes, for example "Barack Obama". Check the Lucene query syntax - http://lucene.apache.org/java/2_3_2/que ... yntax.html

--Hardy


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.