-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 3 posts ] 
Author Message
 Post subject: Stop words and fuzzy search
PostPosted: Fri Dec 04, 2009 6:34 am 
Newbie

Joined: Thu Dec 03, 2009 10:07 am
Posts: 5
Hello,

I try to make a search on a street field wich can contain words such as "street" (or "rue", "de", "la" in french).

These words are discarded by the analyzer and my index is build correctly without them. I use the Stop Filter Analyzer to do this.

For example, the following fields:
Code:
rue de la paix
neighbour street

are indexed as follow:
Code:
paix
neighbour


I'd like to allow user searching on this field by typing "rue de la paie" or "naighbour street" and retrieve the correct record by using a fuzzy search.

How can I build a query that does not use words that must be skipped by the analyzer ?


Here is more explanations and some examples I have try.

If I do something like this:
Code:
Analyzer analyzer = fullTextSession.getSearchFactory().getAnalyzer("personAnalyzer");
QueryParser p = new QueryParser("person.street", analyzer);
Query q = p.parse( "rue de la paix" );


I get the following query:
Code:
person.street:paix


But now, if I had the fuzzy operator on each words, like this:
Code:
Analyzer analyzer = fullTextSession.getSearchFactory().getAnalyzer("personAnalyzer");
QueryParser p = new QueryParser("person.street", analyzer);
Query q = p.parse( "rue~ de~ la~ paix~" );


I get this query:
Code:
person.street:rue~0.5 person.street:de~0.5 person.street:la~0.5 person.street:paix~0.5
wich is not what I really want.

I'd like to get the following query
Code:
person.street:paix~0.5
when the user type
Code:
rue de la paix
.

How can I get that result ? Is it possible ?


[EDIT] I also get another problem using fuzzy search and accented character.

I try to search the firstname with the following user input "jérô" and my code add a fuzzy operator to the text, resulting in "jérô~".

The query returned by the parser is :
Code:
person.firstname :jérô~0.5


I use the ISOLatin1AccentFilter analyzer to remove these accents but when using the parser and this query, nothing is removed.

How can I remove the accents in this case ?

Thanks in advance for responses.


Top
 Profile  
 
 Post subject: Re: Stop words and fuzzy search
PostPosted: Mon Dec 07, 2009 4:38 am 
Newbie

Joined: Thu Dec 03, 2009 10:07 am
Posts: 5
It seems that analyzer isn't used when fuzzy search is enable. Any workaround to correct that ?


Top
 Profile  
 
 Post subject: Re: Stop words and fuzzy search
PostPosted: Fri Dec 11, 2009 8:50 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi, yes I confirm I have had this same problem. It's Lucene's QueryParses which doesn't apply the Analyzer, as you have also verified.
The solution is to avoid the QueryParser and build the equivalent Query programmatically

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 3 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.