-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 5 posts ] 
Author Message
 Post subject: Weird Scoring Problem
PostPosted: Fri Jul 15, 2011 10:12 am 
Newbie

Joined: Fri Jul 15, 2011 9:56 am
Posts: 3
For testing purposes, I manually inserted 40,000 log messages with the same "message" field into our database and incremented the time by 1 second. While searching with the query string: "+<<message:bla message:las message:ase>>" it returns the messages and only sometimes will all the log messages have the same score. This is a very serious problem as we are trying to sort by score then time.

Here is the score explanation from two identical log messages with different scores:
1-------
ScoreExplanation:
0.48027912 = (MATCH) sum of:
0.16198143 = (MATCH) weight(message:bla in 64243) product of:
0.5807454 = queryWeight(message:bla) product of:
1.4875727 = idf(docFreq=40043 maxDocs=65206)
0.390398 = queryNorm
0.27891988 = (MATCH) fieldWeight(message:bla in 64243) product of:
1.0 = tf(termFreq(message:bla)=1)
1.4875727 = idf(docFreq=40043 maxDocs=65206)
0.1875 = fieldNorm(field=message doc=64243)
0.16097377 = (MATCH) weight(message:las in 64243) product of:
0.5789362 = queryWeight(message:las) product of:
1.4829385 = idf(docFreq=40229 maxDocs=65206)
0.390398 = queryNorm
0.27805096 = (MATCH) fieldWeight(message:las in 64243) product of:
1.0 = tf(termFreq(message:las)=1)
1.4829385 = idf(docFreq=40229 maxDocs=65206)
0.1875 = fieldNorm(field=message doc=64243)
0.15732393 = (MATCH) weight(message:ase in 64243) product of:
0.5723353 = queryWeight(message:ase) product of:
1.4660304 = idf(docFreq=40915 maxDocs=65206)
0.390398 = queryNorm
0.2748807 = (MATCH) fieldWeight(message:ase in 64243) product of:
1.0 = tf(termFreq(message:ase)=1)
1.4660304 = idf(docFreq=40915 maxDocs=65206)
0.1875 = fieldNorm(field=message doc=64243)

2-------
0.480276 = (MATCH) sum of:
0.16198035 = (MATCH) weight(message:bla in 64235) product of:
0.5807453 = queryWeight(message:bla) product of:
1.487563 = idf(docFreq=40044 maxDocs=65207)
0.39040047 = queryNorm
0.27891806 = (MATCH) fieldWeight(message:bla in 64235) product of:
1.0 = tf(termFreq(message:bla)=1)
1.487563 = idf(docFreq=40044 maxDocs=65207)
0.1875 = fieldNorm(field=message doc=64235)
0.16097271 = (MATCH) weight(message:las in 64235) product of:
0.57893616 = queryWeight(message:las) product of:
1.482929 = idf(docFreq=40230 maxDocs=65207)
0.39040047 = queryNorm
0.27804917 = (MATCH) fieldWeight(message:las in 64235) product of:
1.0 = tf(termFreq(message:las)=1)
1.482929 = idf(docFreq=40230 maxDocs=65207)
0.1875 = fieldNorm(field=message doc=64235)
0.15732296 = (MATCH) weight(message:ase in 64235) product of:
0.57233536 = queryWeight(message:ase) product of:
1.4660212 = idf(docFreq=40916 maxDocs=65207)
0.39040047 = queryNorm
0.27487898 = (MATCH) fieldWeight(message:ase in 64235) product of:
1.0 = tf(termFreq(message:ase)=1)
1.4660212 = idf(docFreq=40916 maxDocs=65207)
0.1875 = fieldNorm(field=message doc=64235)

Any help will be greatly appreciated.


Top
 Profile  
 
 Post subject: Re: Weird Scoring Problem
PostPosted: Fri Jul 15, 2011 12:05 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
each log message is having the same test right?

it seems the docFreq and maxDocs are differ by 1 (unit) in the two results. I'm not sure if that happened because a new document was added/deleted between the two queries, or maybe Lucene is revealing a +1/-1 inconsistency.

If you're testing with equal strings in all documents the differences of a +1/-1 will be extremely low compared to other factors, this will be totally negligible as soon as you'll have different messages. Also consider the score is a float number, subtle changes in rounding might skew it a bit.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Weird Scoring Problem
PostPosted: Fri Jul 15, 2011 3:14 pm 
Newbie

Joined: Fri Jul 15, 2011 9:56 am
Posts: 3
Thank you for the quick response.

Yes each log message has identical message text. The new document added/deleted does make sense and does seem to be the problem. Is there a good solution around this that you know of?

Thanks


Top
 Profile  
 
 Post subject: Re: Weird Scoring Problem
PostPosted: Sat Jul 16, 2011 7:56 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
No, I would ignore it as it's not a problem in real cases (different than your synthetic test).
As I said the score is about float numbers: they are approximations so you must make sure in your application that small skews from expected numbers are acceptable.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Weird Scoring Problem
PostPosted: Mon Jul 18, 2011 10:00 am 
Newbie

Joined: Fri Jul 15, 2011 9:56 am
Posts: 3
Thank you. Unfortunately I thought scoring was causing my sorting problems and it does not appear to be that anymore.
I'm not sure if i should start a new thread or just ask here. But we have the ability for the user to select "exact text search" to match all the n-grams using and's. With this selected the sorting is perfect(sort by score then time) but when it is not selected the scores are all the same(or close enough) but the times are not ordered properly.

For more clarity on how they are skewed: Each log message was given a number when logged 1 through 10,000 so when returned properly it should count down from 10,000 but without "exact text match" selected it goes 998, 995, 980, 975, 966.....

Here is the exact text match search query string example and the default one was given in the first post
"+<<+message:bla +message:las +message:ase>>"

Thank you again for your continued help.


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 5 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.