-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 9 posts ] 
Author Message
 Post subject: Filters in Hibernate Search
PostPosted: Wed Jan 12, 2011 6:47 am 
Newbie

Joined: Tue Dec 15, 2009 3:33 am
Posts: 13
Location: Melbourne, Australia
Hello,

I'm having some issues with using filters in Hibernate Search.

My idea is to have a query be run then have the filter applied after the initial search and perform some filtering (as is the idea behind filters I guess ;)

My issue is when the query is run and the logic moves into the filter, I'm not sure I'm getting the results from the initial query or all docs from the index. I question this as I've noticed debugging that some docs that are in the filter shouldn't be returned from the initial query.

As well as this, with some simple logging I've noticed that the filter is actually being looped through 5 times. In the code below, in the filter the line
Code:
log.debug("Bit Set before Filter size [" + bits.size() + "]");
is run 5 times as the messages appears in my log file. Once the 5th time is logged, the logic moves out of the filter.

Can I get some help on this please. Below is extracts of code including the model being searched for with Filter Defs, the Hibernate Search object which initiates the search including enabling the filters, the Filter Factory and the Filter itself. I've tried to remove code that is business logic to make it clearer.

I'm using Hibernate Search 3.1.1.GA release, including the Lucene 2.4.1 release jars.

Thanks,

Christian

Media Model with FullTextFilterDefs
Code:
@FullTextFilterDefs( {
      @FullTextFilterDef(name = "mediaTypeFilter", impl = MediaTypeFilterFactory.class),
      @FullTextFilterDef(name = "activeStatusFilter", impl = ActiveStatusFilterFactory.class),
      @FullTextFilterDef(name = "mediaScheduleFilter", impl = MediaSchedulesFilterFactory.class, cache = FilterCacheModeType.NONE)})
public class Media implements Serializable {
...
}


Hibernate Search object
Code:
Session session = HibernateUtils.currentSession();
FullTextSession fullTextSession = Search.getFullTextSession(session);

      try {
         FullTextQuery fullTextQuery = null;

         fullTextQuery = fullTextSession.createFullTextQuery(
               queryBuilder.parseQuery(fullTextSession), queryBuilder.getEntityClass())
                     .setFirstResult(offset)
                     .setMaxResults(limit);

         queryBuilder.loadFilters(fullTextQuery);

         List<T> results = (List<T>) fullTextQuery
                              .setCacheable(true)
                              .list();

         queryBuilder.unloadFilters(fullTextQuery);

         return results;
      } catch (Exception e) {
      ...
      }


Query Builder Object
Code:
   public void loadFilters(FullTextQuery fullTextQuery) {
      if (mediaType != null) {
         fullTextQuery.enableFullTextFilter("mediaTypeFilter").setParameter("mediaType", mediaType);
      }
      
      fullTextQuery.enableFullTextFilter("mediaScheduleFilter")
         .setParameter("session", session)
         .setParameter("userDate", userDate);
   }


Filter Factory
Code:
public class MediaSchedulesFilterFactory {

    private Session session;
    private Date userDate;
   
    @Key
    public FilterKey getKey() {
        StandardFilterKey key = new StandardFilterKey();
        key.addParameter(session);
        key.addParameter(userDate);
        return key;
    }

    @Factory
    public Filter getFilter() {
        Filter mediaSchedulesFilter = new MediaSchedulesFilter(session, userDate);
        return mediaSchedulesFilter;
    }

   public void setSession(Session session) {
      this.session = session;
   }

   public void setUserDate(Date userDate) {
      this.userDate = userDate;
   }
}


Filter object
Code:
public class MediaSchedulesFilter extends Filter {
   private static final long serialVersionUID = 1L;

   public static Log log = LogFactory.getLog(MediaSchedulesFilter.class);

   private Session session;
   private Date userDate;
   
   public MediaSchedulesFilter(Session session, Date userDate) {
      super();
      this.session = session;
      this.userDate = userDate;
   }

   @Override
   public DocIdSet getDocIdSet(IndexReader reader) throws IOException {
      BitSet bits = getBitSet(reader.maxDoc());

      log.debug("Bit Set before Filter size [" + bits.size() + "]");
      
      Term msTerm = new Term(DocumentBuilder.CLASS_FIELDNAME, Media.class.getName());
      
      TermDocs td = reader.termDocs(msTerm);
      
      while(td.next()) {
         Document termDoc = reader.document(td.doc());
         
         Field termDocMediaSchedule = termDoc.getField("mediaSchedules");
         String termDocMediaScheduleValue = termDocMediaSchedule.stringValue();
         
         if(termDocMediaScheduleValue != null && !termDocMediaScheduleValue.isEmpty()) {
            boolean valid = // Logic to determine is valid to be returned in search results
            
            if(valid) {
               bits.set(td.doc());
            }
         }
      }
      
      log.debug("Bit Set after Filter size [" + bits.size() + "]");
      
      DocIdSet docIdSet = new DocIdBitSet( bits );
      return docIdSet;
   }

   private BitSet getBitSet(int maxDoc) {
      BitSet bitSet = new BitSet( maxDoc );
      return bitSet;
   }   
}


Hopefully this is enough information. If more is required please let me know.

Regards,

Christian


Top
 Profile  
 
 Post subject: Re: Filters in Hibernate Search
PostPosted: Wed Jan 12, 2011 7:26 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Hi Christian,

Let's see if I can answer some of your questions.
First, in your filter implementation you are not dealing with any query results. You indeed operate on over all documents. You task in the filter is to create a bit set which Lucene will then overlay with the search results at search time. This is the reason why filter caching works for example. You couldn't cache any filter results if you would see a different set of documents on each call to the filter.

Your code looks ok on the first glance. You don't have to unload the filters though. I am not sure what you try to achieve with that.

Regarding the number of times the filter is invoked, it should be once per query. I am not sure why you seem to get 5 invocations. Is this the exact code you are executing? Is there something you haven't posted yet?

--Hardy


Top
 Profile  
 
 Post subject: Re: Filters in Hibernate Search
PostPosted: Wed Jan 12, 2011 5:06 pm 
Newbie

Joined: Tue Dec 15, 2009 3:33 am
Posts: 13
Location: Melbourne, Australia
Hi Hardy,

Thanks for the reply.

That makes sense now of how the filter is processing all docs in the index. After some more reading of the Hibernate Search in Action book, it makes sense how the caching would work in that respect.

I haven't as yet tested again why it looks like the filter is being processed 5 times in the request. It may be an error on my behalf in logging, I'll investigate further and post the results.

The reason I'm looking at using a filter is that I have results that may only be returned for a specific time (schedule). The schedule is based on user location and current time. I'm wondering if I'm storing the details efficiently in the index for filters to be used best.

Currently I'm indexing all the schedules (each search result can have multiple schedules) as a JSON array using a field bridge. Within the filter, I extract each schedule from the indexed schedules and process them with the users location and current date (stored within a custom Session object, that injected into the filter) to determine if the result is valid or not.

With your explanation of filters, now I'm wondering if I should actually store each section of the schedule (country, region, start date, end date) as separate fields in the index. Then I could have multiple filters for each field (eg. country filter) that will filter depending on the field and the injected details. I could then cache the filters of country and region which will be more efficient I guess than not caching at all and processing each doc every time a search occurs to determine schedule validity.

If you can, what is your opinion on this based on the use case I've given above. Hope that enough information as well.

Regards,

Christian


Top
 Profile  
 
 Post subject: Re: Filters in Hibernate Search
PostPosted: Wed Jan 12, 2011 6:34 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi Christian,
yes it makes perfect sense to store this information in different fields and implement different filters.
Having separate filters gets you more flexibility in enabling a subset of restrictions, and as you say caching will be more effective: way more chances to match the same filter parameter again.

While implementing a filter, remember two more things:
1) the filter code must be very efficient, you'll potentially process each document: it's definitely an hot spot. I'd avoid logging, for example, if not strictly needed: this is about millions of invocations per query, so even the way you deal with basic operations such string matching makes a significant difference.
2) you are not guaranteed to process each document, documents which are already excluded by other filters might not be submitted to another filter. Exact behaviour depends on how enabled filters can be combined.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Filters in Hibernate Search
PostPosted: Wed Jan 12, 2011 7:35 pm 
Newbie

Joined: Tue Dec 15, 2009 3:33 am
Posts: 13
Location: Melbourne, Australia
Hi Sanne,

Thanks for the information. Question though on your point

Quote:
2) you are not guaranteed to process each document, documents which are already excluded by other filters might not be submitted to another filter. Exact behaviour depends on how enabled filters can be combined.


I'm a little confused how this would happen as in my mind each filter will process the index data (depending on the search criteria within it) and turn bits on or off. How is it that a filter will not process a particular document if another has already excluded it when its processed.

Eg.
1000 docs in index

Filter 1 -> process 1000 docs and turns off 20 bits
Filter 2 -> process 1000 docs and turns off 400 bits
Filter 3 -> process 1000 docs and turns off 100 bits

Once the query results are returned, each filter is applied and the off bits remove the particular results from the search result.

Am I correct in this understanding?

Thanks,

Christian


Top
 Profile  
 
 Post subject: Re: Filters in Hibernate Search
PostPosted: Wed Jan 12, 2011 7:50 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Quote:
Once the query results are returned, each filter is applied and the off bits remove the particular results from the search result.

not exactly, but the result it the same. each filter is applied only on the remaining set of documents, after the previous one removed some.
But this depends if they can be chained, if they are being cached, and on the exact type of filter.

I didn't mean to confuse you: the chaining strategy is fairly complex, I just meant to point out that you are not guaranteed to get each document to process in the filter, just the ones you need to process. It might happen that a filter is requested to process each document, but it's possible that some documents have already been removed by another active filter, in this case unneeded filter invocations are avoided.

Quote:
How is it that a filter will not process a particular document if another has already excluded it when its processed.

Well filters have to all agree for a document to be included in the result, it's like a big AND operator, and it's optimized the same way Java does with "&&" instead of "&", for each document.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Filters in Hibernate Search
PostPosted: Wed Jan 12, 2011 7:59 pm 
Newbie

Joined: Tue Dec 15, 2009 3:33 am
Posts: 13
Location: Melbourne, Australia
Sanne,

Again, thanks for the help. I'm understanding the process of filters a lot more now.

I'll go away and do some more work on the filtering for our application and if I have any issues, I'll post again.

Again thanks for all the help, great to get your time.

Christian


Top
 Profile  
 
 Post subject: Re: Filters in Hibernate Search
PostPosted: Wed Jan 12, 2011 9:07 pm 
Newbie

Joined: Tue Dec 15, 2009 3:33 am
Posts: 13
Location: Melbourne, Australia
Hey guys,

Just to go back to the other initial issue with filters, I'm still seing the custom filter being invoked 5 times.

Below is the filter code
Code:
public class MediaSchedulesFilter extends Filter {
   private static final long serialVersionUID = 1L;

   public static Log log = LogFactory.getLog(MediaSchedulesFilter.class);

   private Session session;
   private Date userDate;
   
   public MediaSchedulesFilter(Session session, Date userDate) {
      super();
      this.session = session;
      this.userDate = userDate;
   }

   @Override
   public DocIdSet getDocIdSet(IndexReader reader) throws IOException {
      log.debug("In Media Schedules Filter");

      int bitclearcnt = 0;
      
      BitSet bits = getAllPositiveBitSet(reader.maxDoc());
      log.debug("Bit Set size in MS Filter [" + bits.size() + "]");
      
      Term msTerm = new Term(DocumentBuilder.CLASS_FIELDNAME, Media.class.getName());
      
      TermDocs td = reader.termDocs(msTerm);

      if (td.next()) {
         do {
            Document termDoc = reader.document(td.doc());
            
            Field termDocMediaSchedule = termDoc.getField("mediaSchedules");
            String termDocMediaScheduleValue = termDocMediaSchedule.stringValue();
            
            if(termDocMediaScheduleValue != null && !termDocMediaScheduleValue.isEmpty()) {
               List<MediaSchedule> s = MediaScheduleUtil.parseMediaSchedulesLuceneIndex(termDocMediaScheduleValue);
               
               Media tempMediaForFilter = new Media();
               tempMediaForFilter.setMediaSchedules(s);
               
               boolean valid = isMediaScheduleValid(tempMediaForFilter, session.getLocation(), session.getIpaddress(), userDate);
               
               // clear out the invalid media by schedule
               if(!valid) {
                  bits.clear(td.doc());
                  bitclearcnt++;
               }
            }
         } while (td.next());
      }
      
      DocIdSet docIdSet = new DocIdBitSet( bits );
      
      log.debug("MediaSchedulesFilter completed and returning - bits cleared [" + bitclearcnt + "]");
      return docIdSet;
   }

   private BitSet getAllPositiveBitSet(int maxDoc) {
      BitSet bitSet = new BitSet( maxDoc );
      bitSet.set( 0, maxDoc - 1 );
      return bitSet;
   }   
}


The search invoking is done via
Code:
Session session = HibernateUtils.currentSession();
FullTextSession fullTextSession = Search.getFullTextSession(session);

FullTextQuery fullTextQuery = fullTextSession.createFullTextQuery(queryBuilder.parseQuery(fullTextSession),queryBuilder.getEntityClass());

fullTextQuery.enableFullTextFilter("mediaScheduleFilter")
         .setParameter("session", session)
         .setParameter("userDate", userDate);

List<T> results = (List<T>) fullTextQuery.setCacheable(cacheEnabled)
                     .list();


In eclipse debugging, I can see the previous invocation was from
Code:
ChainedFilter.getDocIdSet(IndexReader)
before entering the custom filter.

I can't see why this would be happening, just wonder if you see anything with the code. The model with Filter defs are at the top of this post in the initial posting.

Thanks,

Christian


Top
 Profile  
 
 Post subject: Re: Filters in Hibernate Search
PostPosted: Thu Jan 13, 2011 10:23 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi,
it's not expected to invoke the same filter multiple times for a single query. Your code looks like fine; could you create a unit test to make it possible for us to reproduce your issue?
checkout the sourcecode, and look for example at how org.hibernate.search.test.filter.BestDriversFilter is being tested as a good starting point.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 9 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.