-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 5 posts ] 
Author Message
 Post subject: [Search] Capping returned search results?
PostPosted: Mon Feb 02, 2009 2:10 pm 
Regular
Regular

Joined: Wed Dec 17, 2003 1:58 pm
Posts: 102
Hi all,
I have an application that could potentially return 100k+ search results depending on the search type, which gives the end user a ui to sort the various columns from these search results. Obviously this is a problem as it is going to kill lucene to perform that much sorting. Is there a way to cap the amount of search results you can work with? I am currently using setFirstResult and setMaxResults (for pagination), but this will only protect the database, not lucene from trying to sort columns on 100k results. The ideal (and what I see frequently) are websites reporting the total results, but only allowing you to work with some subset of them (say 200), and only being able to sort/paginate through those 200, and you have to use further search terms to drill down.

Thanks!
David


Top
 Profile  
 
 Post subject:
PostPosted: Mon Feb 02, 2009 2:50 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
did you try it?
I am using it on thousands of results, found out of millions of documents; the CPU usage has a high spike but the matches are returned in milliseconds.
Is it different when you return millions of results? I can't really test it as I can't think of a wide enough query.

Anyway I don't think you can paginate with sorting without considering all results, otherwise you can't identify which results are going to be included.
If there is an optimization we are missing, I'll be very glad to implement it.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject:
PostPosted: Mon Feb 02, 2009 3:22 pm 
Regular
Regular

Joined: Wed Dec 17, 2003 1:58 pm
Posts: 102
I don't have the numbers just yet, I am at around 35k for the max I have returned in my app, I was just thinking that if Lucene did the search and matched 100k+ results, then had to sort them it might be pretty slow, but I suppose I'll wait and see.

The optimization I was considering (assuming Lucene can even do this), was to match say the first 200 most relevant results, and then those are what you would sort/paginate over. Does that make sense?


Top
 Profile  
 
 Post subject:
PostPosted: Mon Feb 02, 2009 4:57 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
yes that sounds good; but how are you going to define an appropriate relevance threshold? Just limiting to the first 200?
Not sure if Lucene supports something like that; should ask on the Lucene forum.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject:
PostPosted: Mon Feb 02, 2009 4:58 pm 
Regular
Regular

Joined: Wed Dec 17, 2003 1:58 pm
Posts: 102
I wouldn't think of it as a relevance threshold, more of a performance threshold where you say I'm not interested in spending the resources to process and let the user interact with any more than X results, which they likely couldnt do anything meaningful with anyway.


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 5 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.