-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 7 posts ] 
Author Message
 Post subject: Filter Ordering?
PostPosted: Mon Dec 27, 2010 2:01 pm 
Newbie

Joined: Mon Jan 11, 2010 8:34 pm
Posts: 6
I'm writing a custom org.apache.lucene.search.Filter to narrow down results based on spatial information. Due to the work it is doing, it's important that the filter is run last in the chain of filters (processor intensive, etc.). I want to run the filter on the smallest possible result set, i.e. last.

I've searched the docs, forums, HSIA, and can't seem to find any reference to specifying an execution order for filters that have been enabled in a search. Looking at the code, at least in 3.1.1, I see that multiple filters are added to a ChainedFilter internally, but any ordering is lost because it is populated from an unordered map for filter definitions.

I was thinking the order might be based on the order of the annotations inside the @FullTextFilterDefs or the order in which they were enabled, but that doesn't appear to be the case.

Is there a way to specify the execution order of filters that I'm missing? Thanks!

We're using Hibernate Search 3.1.1, JBoss 5.1


Top
 Profile  
 
 Post subject: Re: Filter Ordering?
PostPosted: Mon Dec 27, 2010 2:33 pm 
Newbie

Joined: Mon Jan 11, 2010 8:34 pm
Posts: 6
FYI, I figured out that calling query.setFilter(customSpatialFilter) (in addition to the typical query.enableFullTextFilter()) at least adds my custom filter to the _end_ of the internal filter chain. That method is noted as "semi-deprecated", but it provides a reasonable workaround for me for now.

The subject of filter ordering is still of interest as using this workaround means not being able to take advantage of Hibernate Search's caching mechanism that you get for free when using @FullTextFilterDef. So please fill me in if I'm missing something. Thanks.


Top
 Profile  
 
 Post subject: Re: Filter Ordering?
PostPosted: Mon Dec 27, 2010 6:10 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi Richard,
when working on the filters at the time I took great care to take advantage of possible performance boosts from a clever order, so if the order is not easy to define we should definitely fix that.
Please open a JIRA issue, and provide suggestions on how you would best see fit the API (like if you think respecting the annotations order is the way to go, or what would be more practical).
Also patches are always welcome.
thanks

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Filter Ordering?
PostPosted: Tue Dec 28, 2010 3:32 pm 
Newbie

Joined: Mon Jan 11, 2010 8:34 pm
Posts: 6
Hi, thanks for chiming in, I'm getting a better handle on this now. I was incorrectly assuming (based on some javadoc comments in lucene) that the concept of a ChainedFilter in Lucene / HS would progressively narrow down the input set of documents that each filter would have to consider. If that were true, ordering would be significant. However each filter in the chain has to process the full index, even if previous filters in the chain have already eliminated documents as invalid in the current search (not sure I understand that decision, but I digress).

So please ignore my previous implication that HS wasn't ordering Chained Filters, it doesn't appear that ordering even matters in Lucene, nor is that the core issue for me.

I see that "FilteredDocIdSet" was added in Lucene 2.9 which basically accomplishes what I'm after. It narrows down the set of documents a secondary filter needs to consider, based on the results of a previously executed filter. Example usage: http://grepcode.com/file/repo1.maven.or ... ilter.java

Similarly 'Local Lucene' originally had written something called a SerialChainFilter which did the same things (only online linkI could find): http://code.google.com/p/digmap/source/ ... java?r=170

I don't currently see how one would take advantage of the "FilteredDocIdSet" concept in HS. I can see, in our code, defining a filter @Factory that manually instantiated multiple filters and then passed those into another filter that utilizes the "FilteredDocIdSet". However at that point you've lost the benefits of declaring your filters in annotations and all else that HS brings, etc.

I do see in HS where the internal ChainedFilter is instantiated in FullTextQueryImpl to chain the filters that are configured via @FullTextFilterDefs (http://grepcode.com/file/repository.jbo ... l.java#435). Would it make sense to provide a configurable version of that class (HS' ChainedFilter) that utilized the FilteredDocIdSet concept? Or is there a simpler way to accomplish what I'm after?

Thanks!


Top
 Profile  
 
 Post subject: Re: Filter Ordering?
PostPosted: Tue Dec 28, 2010 3:44 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi,
have a look into org.hibernate.search.filter.AndDocIdSet, which is used by ChainedFilter in case there is more than one filter enabled; please note this implementation is order sensitive, and was meant to be so.
That said, we built this during the Lucene 2.3 & 2.4 era, as Lucene was not providing anything similar. FilteredDocIdSet sounds new to me, please let me know after looking into AndDocIdSet if it makes sense to merge this functionality: patches welcome.

Consider that "lazily iterating" is not great for caching, which is the real killer feature of filters; I consider it far more important to do the full iteration in a very efficient way, so I definitely agree that order is important; last I looked at it, our impl was a bit faster as it's only able to do AND operation (and takes advantage of that in the design), while Lucene's impl is able to perform all boolean operations.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Filter Ordering?
PostPosted: Tue Dec 28, 2010 4:57 pm 
Newbie

Joined: Mon Jan 11, 2010 8:34 pm
Posts: 6
Thanks Sanne. Side note: org.hibernate.search.filter.ChainedFilter and AndDocIdSet are written with ordering in mind, however out in org.hibernate.search.query.FullTextQueryImpl.buildFilters(), the order of filters defined via @FullTextFilterDefs has been lost as they were stored in a map. I'm assuming the order of the items in a @FullTextFilterDefs are intended to define the order of the filters. If that is a bug I'd be happy to write it up.

That said, I think what I'm trying to do is a bit odd, which is why I don't see a clear answer, and why other similar projects had to implement their own filter chaining mechanism. I'm looking to do an expensive filtering operation on the smallest subset of documents possible and the input to the filter will change for almost every search. We have i.e. 1,000,000 documents in our index, representing documents from several different customers. During a search with spatial criteria, we want to execute all other filters first to narrow down the result set before doing the expensive spatial filtering. i.e. we want to filter on 1000 documents from customer A, not the 1M from all customers. The filtering operation is basically: Given a bounding box (as a filter input), does this document's location intersect with or is it contained in the given bounding box.

I've come across this posting which illustrates the state of flux that Lucene spatial search is in: http://blog.jteam.nl/2010/12/22/ssp-2-0/

For now I will write a filter that factors in our spatial criteria and enough other information to narrow down the scope of the filtering. Will be a bit of duplicate, but it sounds like the best option currently and other options are on the horizon. We're stuck in Hibernate Search 3.1.1, but are working on upgrading to the latest with JBoss AS 6. I appreciate the input.


Top
 Profile  
 
 Post subject: Re: Filter Ordering?
PostPosted: Tue Dec 28, 2010 7:35 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
You should be able to cache some filters, and avoid caching what doesn't make sense to. Sorry in previous post I stated that AndDocIdSet was designed taking order in consideration, but I just read it again to check and I remember now that it's not the case (it was before, but it's no more). The order is determined by a cool last-failed-match pattern which has proven itself quite effective to minimize filter invocations globally, but not favoring/avoiding any specific filter implementation. Still I expect that most filters will actually be implement by an OpenBitSet, so they will be pre-merged (see org.hibernate.search.filter.FilterOptimizationHelper.mergeByBitAnds() ), and the result of this merge combined with your dynamic processing via location box, so in this case the result of the AndDocIdSet will use absolute minimal invocations needed on your filter, and also not needing to be relevant by order.

So the order is not relevant, the implementation of each filter will affect combination optimizations: if you see some odd performance results, please let us know (and sorry for confused idea in previous post).
Also, did you do already some performance measuring? If you can, please share them: I'd expect some good numbers, filtering by customer significantly reduces the amount of documents and should be pretty well cache-able.

About upgrading, definitely to suggest: JBoss AS6 will be out very soon (was tagged already), and works with HSearch 3.3 with Lucene 3.x, which both enable more nice performance tricks, and more are to come in v 3.4, also targeting AS6. As usual we try hard to not break backwards compatibility, please report migration issues apart from the Lucene 2 vs 3 minor API differences.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 7 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
cron
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.