-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 8 posts ] 
Author Message
 Post subject: Hibernate Search - MassIndexer recovery from failure
PostPosted: Thu Mar 15, 2012 3:06 pm 
Newbie

Joined: Thu Mar 15, 2012 2:58 pm
Posts: 4
I am using the massindexer to index many entities with significant document counts. In my case, our data is read-only except for a monthly refresh of the entire database, so after the refresh, we just rebuild all of the indexes. In this process, we sometimes have failures (db connection pooling failures, network failures, etc). It is maddening to get through 20 million records after 3 hours only to have to start from scratch again.

Is it possible to restart the massindexer where it left off? What are the "best practices" or other strategies that people use to handle massindexing failures?

A resume method we be really great if it doesn't exist.

Thanks in advance.


Top
 Profile  
 
 Post subject: Re: Hibernate Search - MassIndexer recovery from failure
PostPosted: Fri Mar 16, 2012 11:25 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi,
interesting I don't think we have many "best practices" around this as usually connections / databases are reliable enought to make this problem unlikely to happen in practice; but this doesn't seem to be your case.

There is a possible solution, but it depends on what kind of errors you have:

The docs describe the option to register a custom ErrorHandler: http://docs.jboss.org/hibernate/search/ ... e/#d0e2458
you could collect information about which entities failed indexing, and assuming they are quite limited when the MassIndexer is done use the FulltextSession#index method to index them individually.

You won't be able to resume an indexing process if the process crashed, like if you kill the JVM you'll have to restart from the beginning.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Hibernate Search - MassIndexer recovery from failure
PostPosted: Fri Mar 16, 2012 11:25 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
BTW, feel free to describe this further and propose improvements..

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Hibernate Search - MassIndexer recovery from failure
PostPosted: Fri Mar 16, 2012 12:15 pm 
Newbie

Joined: Thu Mar 15, 2012 2:58 pm
Posts: 4
Thanks Sanne.

Our problem is not when single entities fail, but rather the whole indexing process fails. Regardless of why it fails, it seems that it would be useful to be able to resume indexing, or even start an index while preserving the existing entities. I know that we could just use the regular Index method, but if we have a table with 50 million rows and we need to add 20 million rows ( or resume indexing if it fails for some reason at 30 million), I would like to use the massindexer because it could take 2 days to index those 20 million rows and it doesn't make sense to have to reindex the 50 million existing rows if nothing about them have changed.

I suppose that one would need to somehow exclude or filter existing entities from the batch load. I may try to take a look at the source code this afternoon and see if I have any ideas.


Top
 Profile  
 
 Post subject: Re: Hibernate Search - MassIndexer recovery from failure
PostPosted: Fri Mar 16, 2012 12:34 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
You're very welcome to comment on the sources and propose improvements.

This could be a good read as well: https://hibernate.onjira.com/browse/HSEARCH-499

It's not writing anything persistent on disk / database, which would be needed for a "resume" kind of task. If we go that way, it should be a) very fast b) threadsafe

Also consider that extracting mose IDs from the index to "check for existance" in the database is likely more expensive than throw-away + reindex all.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Hibernate Search - MassIndexer recovery from failure
PostPosted: Fri Mar 16, 2012 1:40 pm 
Newbie

Joined: Thu Mar 15, 2012 2:58 pm
Posts: 4
Interesting discussion there. I looked at the code briefly and can definitely see how this is more complex than it appears on the surface. I can see the benefits of all the various strategies that have been discussed, but the notion of a user provided stream of IDs strikes me as being a pretty flexible solution.


Top
 Profile  
 
 Post subject: Re: Hibernate Search - MassIndexer recovery from failure
PostPosted: Fri Mar 16, 2012 1:46 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Agreed, but would it solve your problem? Please comment on the issue, I'm going to give it a try *this weekend* ;-)

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Hibernate Search - MassIndexer recovery from failure
PostPosted: Fri Mar 16, 2012 2:38 pm 
Newbie

Joined: Thu Mar 15, 2012 2:58 pm
Posts: 4
I can see the general use of it, but in my case specifically, it may be possible to set purgeAllOnStart(false) and then programatically "batch" the massindexer by streaming in batches of IDs, say 5 million at a time. Maybe then I could use a custom errorhandler to catch a failure point and flush the index and pick up the batching at the failure point.

That may be a totally naive though, since I haven't looked nearly closely enough at the code.

Another option may be to handle the batches myself as described above and then to rename the index directories as they are created and then merge the individual indexes at the end.

My goal here would be to sacrifice some runtime in favor of being able to automate the recovery or the updates as much as possible.


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 8 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.