-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 5 posts ] 
Author Message
 Post subject: Double Metaphone scoring in Hibernate Search
PostPosted: Mon Mar 08, 2010 6:33 am 
Regular
Regular

Joined: Mon Mar 10, 2008 6:40 pm
Posts: 114
I've implemented double metaphone indexing on a name field. Works great except the results are not at all sorted by how closely a name matches what's in the database. So I created 2 indexes on the name field, a regular one with a higher boost and the double metaphone version. I then do a multifield parser to check both fields on a query. This works in that the exact matches come up first and the inexact matches come up next. But still, for a search of "John Smith", the name "John Smyth" will come up last while a name like "Yawm Xnidt" will come up first. I just made up that example, but in reality there it can even be much worse than that.

Maybe this is a limitation of double metaphone? Or a limitation of lucene's implementation? I'm hoping it's just a simple configuration option I haven't enabled...

Here's my code:
Code:
@AnalyzerDef(name="phonetic",
    tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
    filters = {
        @TokenFilterDef(factory = StandardFilterFactory.class),
        @TokenFilterDef(factory = PhoneticFilterFactory.class,
            params = {
                    @Parameter(name = "encoder", value = "DoubleMetaphone"),
                    @Parameter(name = "inject", value = "false")
            })
    })
public class Name {
    @Fields({@Field(boost = @Boost(2.0f)),
             @Field(name = "fullPhonetic", analyzer = @Analyzer(definition = "phonetic"))})
    private String full;

How can I make closer matches appear closer to the top of the search results?


Top
 Profile  
 
 Post subject: Re: Double Metaphone scoring in Hibernate Search
PostPosted: Wed Mar 10, 2010 11:32 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Hi,

how does your query look like? Do you use the phonetic analyzer across all fields or only for the fullPhonetic field? Are you using a PerFieldAnalyzerWrapper. I would make sure that the phonetic analyzer is only applied to the fullPhonetic field. Together with boosting (either at index build time as in your example or at query time) you should get the result you are after.

If not, I recommend you check the Lucene forum about double metaphone. It is really more of a Lucene "problem" than a Hibernate Search one.

--Hardy


Top
 Profile  
 
 Post subject: Re: Double Metaphone scoring in Hibernate Search
PostPosted: Thu Mar 11, 2010 10:10 pm 
Regular
Regular

Joined: Mon Mar 10, 2008 6:40 pm
Posts: 114
hardy.ferentschik wrote:
how does your query look like? Do you use the phonetic analyzer across all fields or only for the fullPhonetic field? Are you using a PerFieldAnalyzerWrapper. I would make sure that the phonetic analyzer is only applied to the fullPhonetic field. Together with boosting (either at index build time as in your example or at query time) you should get the result you are after.

Thank you for your insight Hardy. I wasn't aware of the PerFieldAnalyzerWrapper, just expecting or hoping Hibernate Search would take care of this for me. I was calling getSearchFactory().getAnalyzer(searchedEntity) for the Analyzer to pass into the MultiFieldQueryParser. So I guess I have to instead pass in a PerFieldAnalyzerWrapper that encapsulates the proper Analyzer to use for every possible field someone may search on (taking advantage of a default of course)... Shouldn't this be automatically built based on my annotations? I'll just have to remember to copy the analyzer I use for each field to the global PerFieldAnalyzerWrapper instance whenever I add or update a field...

Ok, so how do I pass in my "phonetic" analyzer I created through an annotation above? Can I get a reference to the analyzer object for it somewhere?


Top
 Profile  
 
 Post subject: Re: Double Metaphone scoring in Hibernate Search
PostPosted: Fri Mar 12, 2010 4:46 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hello,
Quote:
I wasn't aware of the PerFieldAnalyzerWrapper, just expecting or hoping Hibernate Search would take care of this for me.

Yes it should care about this, if you can debug it you should receive a org.hibernate.search.util.ScopedAnalyzer containing the mapping override for the specific fieldname.

Quote:
Can I get a reference to the analyzer object for it somewhere?
you can use analyzer names instead of class types:
Code:
getSearchFactory().getAnalyzer("phonetic")


Getting back to your first question:
Quote:
How can I make closer matches appear closer to the top of the search results?

what is your definition of "closer" ? It might well be possible that all is fine: the doublemetaphone does compute proximity basing on phonetic symbols, and the example you're using would indeed be encoded in very close forms: the y to i mismatch might be considered a more relevant change than the rest of letters. I'd suggest you to play with the Analyzer and see it's lowlevel output, you might want to use a different analyzer or ask to Lucene's experts.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Double Metaphone scoring in Hibernate Search
PostPosted: Fri Mar 12, 2010 6:08 am 
Regular
Regular

Joined: Mon Mar 10, 2008 6:40 pm
Posts: 114
s.grinovero wrote:
Getting back to your first question:
Quote:
How can I make closer matches appear closer to the top of the search results?

what is your definition of "closer" ? It might well be possible that all is fine: the doublemetaphone does compute proximity basing on phonetic symbols, and the example you're using would indeed be encoded in very close forms: the y to i mismatch might be considered a more relevant change than the rest of letters. I'd suggest you to play with the Analyzer and see it's lowlevel output, you might want to use a different analyzer or ask to Lucene's experts.

Ok, so I implemented the PerFieldAnalyzerWrapper and it had no effect whatsoever. Is that good in that Hibernate Search can use annotations to figure out what analyzers to use for the query? Or did I miss something? Double metaphone does seem to have 2 levels of scores. Within those levels though there can be some pretty big differences and so it looks very odd to see an almost identical name come way after many seemingly totally different names in search results...

Thanks for your help.


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 5 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.