-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 3 posts ] 
Author Message
 Post subject: Bizarre results from query with date range and stop words
PostPosted: Sat Apr 11, 2009 6:09 pm 
Newbie

Joined: Thu May 08, 2008 2:34 pm
Posts: 6
This is probably Lucene goodness, but...

Using the StandardAnalyzer for both indexing and query...
And given the query like:
with AND publishDate:[20090404 TO 20090411]

Because "with" is a stop-word, the query returns all items in the given date range, even though the word "with" is not even in the index. Replacing "with" with a non-stop-word gives the expected result.

My current workaround for this is to use SimpleAnalyzer for my QueryParser instead of StandardAnalyzer (since it doesn't include a StopFilter).

Can anyone confirm and/or explain this?
Does it make any sense?

Thanks!
Phil


Top
 Profile  
 
 Post subject:
PostPosted: Sun Apr 12, 2009 2:38 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
yes this is expected; stopwords are never inserted in the index, and provided you are analyzing the query the same way they are also removed from the query conditions.
You can customize the definition of stopwords, so if "with" is an important term you can remove it from your stopwords list.

You may prefer to define your own analyzer using the very handy "analyzerdef" feature of Hibernate Search: usually SimpleAnalyzer is too limited and StandardAnalyzer too standard :-)
Look at example 1.10 of reference docs for an example.

If the "with" is coming from user input and you want to filter more than just on the daterange, you should show some form validation error like "query too vague, add some more terms" checking the query with the analyzer to see if it removed all significant words, before executing the query.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject:
PostPosted: Tue Apr 14, 2009 11:55 am 
Newbie

Joined: Thu May 08, 2008 2:34 pm
Posts: 6
Thanks Sanne!

It all makes sense to the enlightened mind :)

For the benefit of others... it really helps to log query strings before and after parsing. Eg., QueryParser.parse(input).toString().

For this example, the input is:
with AND publishDate:[20090314 TO 20090414]

The parsed query becomes:
+publishDate:[20090314 TO 20090414]

Since "with" is a stop word, the entire term is removed.

Phil


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 3 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.