-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 
Author Message
 Post subject: [search] Use of @Boost annotation on a field?
PostPosted: Mon Jul 02, 2012 11:46 am 
Beginner
Beginner

Joined: Mon Apr 11, 2011 7:56 am
Posts: 38
Section 4.2.1 from the reference manual discusses field boosting and states the following:

'The text field will be 1.2 times more important than the isbn field'

Code:
...
    @Lob
    @Field(boost=@Boost(1.2f))
    public String getText() { return text; }

    @Field
    public String getISBN() { return isbn; }
...


Lucene says the following in their FAQ about index time boosting:
Quote:
Index time field boosts (field.setBoost(boost)) are a way to express things like "this document's title is worth twice as much as the title of most documents". Query time boosts (query.setBoost(boost)) are a way to express "I care about matches on this clause of my query twice as much as I do about matches on other clauses of my query".

Index time field boosts are worthless if you set them on every document.

Index time document boosts (doc.setBoost(float)) are equivalent to setting a field boost on ever field in that document.


Not 100% sure how field boosts are implemented in Hibernate Search, but after a quick look at the source code I think the field boost of a document are set using a LuceneOptions object, which probably does the same as field.setBoost(...).

In that case, it seems kinda useless to have a @Boost annotation on a field, based on the Lucene FAQ. The boost will then only be effective when querying mutliple types of entity classes.

Or can a field boost still be effective to rank a document with [title:'foo' desc:'bar'] higher than a document with [title:'bar' desc:'foo'] when searching for 'foo' on all search fields with 'title' being index-time boosted to 10f. In that case, the Lucene FAQ seems wrong.


Top
 Profile  
 
 Post subject: Re: [search] Use of @Boost annotation on a field?
PostPosted: Wed Jul 04, 2012 9:27 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Hi,

I think the comment on the Lucene wiki is meant for the case where you boost every single field AND only search on this single field. In this case the boost is of course useless, because all fields have the same boost.

However, if you search across multiple fields as in the documentation or your example boosting has an effect. And there is of course also the use case where you search several entity types which have the same field. In this case you can get different results even if you search just on this field.

If you need a more flexible type of index time boosting you should look at @DynamicBoost feature. There you can assign dynamic boost values for each field value.

All this is of course theory to a certain degree. I recommend to write a little unit test and experiment yourself. You can use the explain functionality to inspect why certain documents matched and which score they received.

Hope this helps.

--Hardy


Top
 Profile  
 
 Post subject: Re: [search] Use of @Boost annotation on a field?
PostPosted: Thu Jul 05, 2012 5:53 am 
Beginner
Beginner

Joined: Mon Apr 11, 2011 7:56 am
Posts: 38
As suggested, I just tested the field boost, and it is indeed effective, so the Lucene FAQ seems wrong.

For testing I used an existing testproject with 1.3M+ publications (sry, don't have time to create a reusable unit test).
Each publication instance has a title and description field. The description is an ordinary non-boosted field, title has a field boost of 1000f:

Code:
@Fields({@Field(name = "title", boost = @Boost(1000.0f), analyzer = @Analyzer(definition = "withoutStopWordFilter")), @Field(index = Index.UN_TOKENIZED, name = "suggest")})
protected String _title = "";

@Fields({@Field(analyzer = @Analyzer(definition = "default"), name = "description")})
protected String _description = "";


I created 2 docs with unique terms in it:
DocA[title="barbar", description="foofoo and foofoo, but also barbar"]
DocB[title="foofoo and foofoo, but also barbar", description="barbar"]

Searching for publications for query:
'title:foofoo description:foofoo'
results in this ranked list:
  • DocB[title="foofoo and foofoo, but also barbar", description="barbar"]
    Code:
    2511.4648 = (MATCH) sum of:

        2511.4648 = (MATCH) weight(title:foofoo in 0), product of:
            0.32124138 = queryWeight(title:foofoo), product of:
                14.396252 = idf(docFreq=1, maxDocs=1315068)
                0.022314237 = queryNorm
            7817.999 = (MATCH) fieldWeight(title:foofoo in 0), product of:
                1.4142135 = tf(termFreq(title:foofoo)=2)
                14.396252 = idf(docFreq=1, maxDocs=1315068)
                384.0 = fieldNorm(field=title, doc=0)
  • DocA[title="barbar", description="foofoo and foofoo, but also barbar"]
    Code:
    3.2701366 = (MATCH) sum of:

        3.2701366 = (MATCH) weight(description:foofoo in 1), product of:
            0.32124138 = queryWeight(description:foofoo), product of:
                14.396252 = idf(docFreq=1, maxDocs=1315068)
                0.022314237 = queryNorm
            10.179687 = (MATCH) fieldWeight(description:foofoo in 1), product of:
                1.4142135 = tf(termFreq(description:foofoo)=2)
                14.396252 = idf(docFreq=1, maxDocs=1315068)
                0.5 = fieldNorm(field=description, doc=1)

I first tested with a field boost of 10f, which wasn't enough to push DocB to the top. With the higher boost of 1000f it's clear that boosting a field for all docs ísn't useless. I'll pook Lucene to update their FAQ.


Top
 Profile  
 
 Post subject: Re: [search] Use of @Boost annotation on a field?
PostPosted: Thu Jul 05, 2012 6:40 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Thanks for the update :-)


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.