-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 2 posts ] 
Author Message
 Post subject: Prefix searches only work with small data sets?
PostPosted: Mon Mar 22, 2010 7:32 pm 
Regular
Regular

Joined: Mon Mar 10, 2008 6:40 pm
Posts: 114
Prefix searches like the following don't seem to work except for small data sets:
ar*
a*

In our test environment with a very limited amount of data, the above searches work as expected. But when you increase the data set to anything but a very small size (thousands of records) we get the following exception:
Code:
org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 1024
   at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:163)
   at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:154)
   at org.apache.lucene.search.PrefixQuery.rewrite(PrefixQuery.java:54)
   at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:383)
   at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:383)
   at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:162)
   at org.apache.lucene.search.Query.weight(Query.java:94)
   at org.apache.lucene.search.Searcher.createWeight(Searcher.java:185)
   at org.apache.lucene.search.Searcher.search(Searcher.java:136)
   at org.hibernate.search.query.QueryHits.updateTopDocs(QueryHits.java:100)
   at org.hibernate.search.query.QueryHits.<init>(QueryHits.java:61)
   at org.hibernate.search.query.FullTextQueryImpl.getQueryHits(FullTextQueryImpl.java:376)
   at org.hibernate.search.query.FullTextQueryImpl.getResultSize(FullTextQueryImpl.java:767)

Are prefix searches not really allowed with Hibernate Search or Lucene except for test environments? Or am I missing a flag somewhere? I'm guessing by the error that Hibernate Search or Lucene basically queries first for terms matching the prefix expression and then uses those terms in the query for documents? So if a* matches more than 1024 terms, then Lucene is going to throw an exception? Are we expected to raise that limit dramatically?

It's very important that I allow wildcard searching to our users. Hibernate Search and Lucene have to allow this kind of searching, right? Even if it's severely limited, I'd love to check first and tell the user, "sorry our search mechanism isn't capable of search a* because you need to be more specific, try another letter after a" rather than "sorry, there was a problem."


Top
 Profile  
 
 Post subject: Re: Prefix searches only work with small data sets?
PostPosted: Tue Mar 23, 2010 6:27 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
hi mueller,
yes that's a strong limitation of all information retrieval systems - not limited to Lucene only.
As you have guessed it's building a boolean query containing all matching terms from the prefix, the maxClauseCount is set to a default of 1024 and you could set that higher but it won't solve the problem for all cases - and possibly make the system perform badly.

The usual solution which is implemented in these cases is exactly what you suggest:
Quote:
It's very important that I allow wildcard searching to our users. Hibernate Search and Lucene have to allow this kind of searching, right? Even if it's severely limited, I'd love to check first and tell the user, "sorry our search mechanism isn't capable of search a* because you need to be more specific, try another letter after a" rather than "sorry, there was a problem."

That's right, so you should catch the TooManyClauses exception and show a message to the user to politely ask to refine it's search terms - you might have noticed this on many online services.

Google does some complex workarounds, you might notice that the simple query "A*" does return the same as "A", while using a star in a more complex text results in somewhat unexpected results... there's no general solution; you can set the clauseLimit very high and verify how bad the performance goes.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 2 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.