-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 8 posts ] 
Author Message
 Post subject: Massindexer limit selected records
PostPosted: Wed Mar 10, 2010 10:20 am 
Pro
Pro

Joined: Wed Nov 05, 2003 7:22 pm
Posts: 211
Hi,

My indexable objects have a validity lifetime. Consequently, I would like to prevent indexing objects that are no longer valid. It just wastes space and processing power weeding them out later.

I was wondering if the Massindexer will allow for the option to feed it an id list with the objects to include?

Kind regards,

Marc


Top
 Profile  
 
 Post subject: Re: Massindexer limit selected records
PostPosted: Thu Mar 11, 2010 9:47 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi Marc,
the MassIndexer currently available in beta1 is not able to receive a list, main reason being that it doesn't work in terms of id lists.

For details about how these identifiers are produced, have a look into org.hibernate.search.batchindexing.IdentifierProducer :
http://fisheye.jboss.org/browse/Hibernate/search/trunk/src/main/java/org/hibernate/search/batchindexing/IdentifierProducer.java?r=18710
It's important that it's not a list but it produces short lists of identifiers and feeds them to a queue, which will block the producer when it's full to make sure you're not going in out-of-memory and still properly buffer the operations.

I think we could introduce a pluggable factory for the two Criterias, the one used in line 115 to count the number of identifiers and the one on line 127; to better define how this should work I'd be happy to help you out to solve your specific problem and then see how we can abstract that into a general API which could be used by anyone.

I guess being able to add a Criterion would be good enough for your use case?
You would need to define a specific Criterion for each different indexed type, so a Map type/Criterion should be added, or a map type/CriteriaFactory

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Massindexer limit selected records
PostPosted: Sat May 01, 2010 10:13 am 
Pro
Pro

Joined: Wed Nov 05, 2003 7:22 pm
Posts: 211
Well, I started thinking about using the enableFilter option to be able to do this and having a filter on the class to be indexed. But I'm not sure if MassIndexer actually uses filters in executing queries. Initial tests seem to suggest no.


Top
 Profile  
 
 Post subject: Re: Massindexer limit selected records
PostPosted: Sat May 01, 2010 10:20 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
what would you think of introducing a method on MassIndexer to set the HQL which should be used to load all PK ?
some requirements:
* must be usable by all types being reindexed, or we should provide a map (a different HQL string per type)
* if you need to set parameters, they should be collected and passed over

what do you think of my initial idea of using a Criterion?
feel free to make some experiments and propose a strategy, a patch, or join discussion on developers mailing list.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Massindexer limit selected records
PostPosted: Tue Oct 05, 2010 4:09 am 
Newbie

Joined: Wed Jan 28, 2009 9:46 am
Posts: 10
Hi Sanne,

I realise I am reviving a slightly old thread but I was wondering if there was anymore discussion on this issue. I have a similar requirement where I want to be able to selectively index items in my DB using the MassIndexer. I noticed that an issue http://opensource.atlassian.com/project ... SEARCH-499 exists. Are you still looking for patches and / or discussing this issue? If so I would be willing to help if needed.

Cheers,
Ben

_________________
http://www.fotegrafik.com


Top
 Profile  
 
 Post subject: Re: Massindexer limit selected records
PostPosted: Tue Oct 05, 2010 4:46 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi Ben,
sure if you have patches to propose or approaches to discuss, they're always welcome; for implementation brainstorming make sure to use the mailing list.
In my opinion I like it most if you can start with a (failing) unit test and attach it to the issue, so we can clarify exactly what we're looking for and how the API would look like, but any help is welcome.
Cheers

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Massindexer limit selected records
PostPosted: Thu Oct 07, 2010 12:16 am 
Newbie

Joined: Wed Jan 28, 2009 9:46 am
Posts: 10
Hi Sanne,

Lets start with brainstorming as you (and your team) will have way more knowledge about the issues than myself.

Let me prefix with the fact that I have not yet looked at the code, so I am not sure what is and is not possible. However I was thinking that the MassIndexer could take one of the following to restrict selections when indexing:
    SQL
    HQL
    Criteria Object

The next issue would be to enforce that Query, SQLQuery and Criteria are mutually exclusive when executed by the MassIndexer, so perhaps the best approach is to add them as parameters to the "createIndexer" method. So the code would look something like:

Code:
// Using a HQL Query
final FullTextSession fullTextSession = Search.getFullTextSession(session);
Query query = session.createQuery("from Example where flag = :flag");
query.setBoolean("flag", true);
fullTextSession.createIndexer(query).
       cacheMode(CacheMode.NORMAL).
       ...
       startAndWait();

// Or using a SQL Query
final FullTextSession fullTextSession = Search.getFullTextSession(session);
SQLQuery sqlQuery = session.createSQLQuery(select * from Example where flag = ?);
sqlQuery.setBoolean(0, false);
fullTextSession.createIndexer(sqlQuery).
       cacheMode(CacheMode.NORMAL).
       ...
       startAndWait();

// Or finally using a Criteria Object
final FullTextSession fullTextSession = Search.getFullTextSession(session);
Criteria criteria = session.createCriteria(Example.class);
criteria.add(Restrictions.naturalId());
fullTextSession.createIndexer(criteria).
       cacheMode(CacheMode.NORMAL).
       ...
       startAndWait();


What do you think? Is this a reasonable approach?

Cheers,
Ben

_________________
http://www.fotegrafik.com


Top
 Profile  
 
 Post subject: Re: Massindexer limit selected records
PostPosted: Sun Oct 10, 2010 4:20 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi Ben,
thanks for starting this.

First thing to consider is that the filter is going to be applied on other sessions; being multi-threaded it's not going to use the current session but you have to provide a means to apply the filtering to a session, but the API isn't exposing the Session instance.

Have a look at org.hibernate.search.batchindexing.IdentifierProducer, method loadAllIdentifiers. That method is responsible to produce the identifiers of all objects which are going to be indexed; also consider that it's streaming them and not loading them all in memory to be able to cope with huge datasets.
Two things are done: a count to know the final number of results to be indexed, then a select on the identifiers.

Only one instance of IdentifierProducer will be created per type, but when indexing more types several IdentifierProducers might be active at the same time.

To properly change the criteria, restrictions must be applied to both the count and select statements in a consistent way, please try with the Criteria API first; I guess that to cover the SQL and HQL cases we could add some alternate implementations of IdentifierProducer.
I'd suggest to try DetachedCriteria, that should cover the use case quite well, but we're open to open suggestions.

if you think it's important to provide also SQL and HQL alternatives, then you should make sure that both the count query and the identifier-loading queries are provided.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 8 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
cron
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.