-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 7 posts ] 
Author Message
 Post subject: Asynchronous MassIndexer
PostPosted: Fri Jan 06, 2012 9:37 am 
Newbie

Joined: Fri Jan 06, 2012 9:12 am
Posts: 3
I actually use lucene to index about 17 mio entries in a database (oracle). This is realized using a simple ScrollableResult of a projection. This takes about 2 hours to index the whole database. In the meantime the old index is still in use, so I don't need to shutdown the application. After it has rebuilt, I delete the old index and let lucene reopen the index now using the new index built before. The outage of the index search is then really small.

I actually consider moving to hibernate search. I let the MassIndexer index the database which would take > 9 hours calculating the time used for the first 500k rows. After playing around with some parameters it now indexed about 6 mio rows in 1 hour but then slowing down a bit. Hibernate Search takes a lot more time to re-index which makes my request even more important. Is it possible to do the rebuild asynchronous? Use the old index while the MassIndexer is re-indexing the whole database and when it's finished switch the index (maybe another location). I know this means loosing the database changes while indexing is in progress, but this would be ok.

Any ideas?


Top
 Profile  
 
 Post subject: Re: Asynchronous MassIndexer
PostPosted: Sat Jan 07, 2012 5:55 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi,
regarding performance, Hibernate Search's MassIndexer is expected to be -generally- faster than the simple ScrollableResult option, but I guess you've tried 4.0.0.Final which is affected by excessive background commits, which are slowing it down conswiderably: https://hibernate.onjira.com/browse/HSEARCH-1019.

It's normal that is slows down a bit, as the cost of segments merges is related to the index size.

Currently there's an "async" option, but all it does it to work in background.. it will still wipe out the index, so your application can be used in the mean time but will be missing most results initially, gradually showing more results while it's approaching the finish.

In Search 3.x it was expected that indexing where to happen "offline", so it was not allowed to use the engine while index was being rebuilt - mostly targeting the use cases of upgrading an application, initial deployments, recovery after maintenance. Only since 4.0 it's allowed to use the application while a background MassIndexer is working, but the general pattern is still
1) wipe out the index (optionally can be skipped)
2) Add again all entities from the database, loading and processing them with multiple threads

So what I'd like to do is to add an option for phase 2) to use "Update" operations instead of "Add" operations, so that when one skips the initial wipe you don't end up with duplicates.

The limitation of this approach is that it will fail to delete entities from the index which are no longer in the database. Do you think this would still be acceptable for your case?
Otherwise we must add a third phase:
3) verify existance of all entities mentioned in the index - delete those not found in the database.

I think I could easily add the "Update" variant, but would need some more time for the third phase. What do you think of it?

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Asynchronous MassIndexer
PostPosted: Sat Jan 07, 2012 5:56 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
P.S.
when running queries via Hibernate Search finding matches which are not found in the database is not a problem, they will be discarded from the results before being returned, but is obviously suboptimal in terms of performance and index size maintenance.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Asynchronous MassIndexer
PostPosted: Mon Jan 09, 2012 4:17 am 
Newbie

Joined: Fri Jan 06, 2012 9:12 am
Posts: 3
A wipe of the index is acceptable when you can rebuild your index in a couple minutes or even some seconds. When I can increase the speed to 1 hour I still got the problem of missing results while the index is rebuilt, which is not an option for me. So your update solution is really interesting for me. I don't delete much data, just mark it deleted (viewable in the application, but read only). Deleting from database is done just a couple times a year and in this case I may rebuild the index from scratch. Phase 3) is not top priority for me but an update mechanism without it would prevent other users from using it. So it would be great to see such a feature in Hibernate Search. As long as it doesn't exist, I'll not be able to switch to Hibernate Search.

Should I open a feature request on jira?


Top
 Profile  
 
 Post subject: Re: Asynchronous MassIndexer
PostPosted: Mon Jan 09, 2012 4:28 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Yes please open a request on Jira.
I've already started some experiments and it's definitely doable, but I won't be able to work on it until next week.

Also the performance problem of the MassIndexer was solved yesterday, you could try version 4.1.0-SNAPSHOT to get a feel of the new speed.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Asynchronous MassIndexer
PostPosted: Mon Jan 09, 2012 5:55 am 
Newbie

Joined: Fri Jan 06, 2012 9:12 am
Posts: 3
done: https://hibernate.onjira.com/browse/HSEARCH-1032


Top
 Profile  
 
 Post subject: Re: Asynchronous MassIndexer
PostPosted: Mon Jan 09, 2012 6:27 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
perfect, thanks.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 7 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.