-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 
Author Message
 Post subject: rebuild a large index in a distributed way?
PostPosted: Tue Jun 23, 2009 1:59 pm 
Newbie

Joined: Tue Jun 23, 2009 1:36 pm
Posts: 2
Hi,

I need to rebuild a large lucene index after a install of new version of a third-party db. I have multiple nodes to distribute the indexing work to. The result should be a single index (or may be sharded).

What is the best way to set this up with hibernate search? I've read "Hibernate search in action", but I didn't find a solution for this kind of problem.

I am thinking to create one central program to put primary key values on a queue, each slave reads pk values from the queue, looks up the data in the db , and writes to a local index. Finally the different indexes are sent to a central node who merges them into one final index (or sharded index), again by using JMS, as described in the book .

all suggestions are welcome!

Bram


Top
 Profile  
 
 Post subject: Re: rebuild a large index in a distributed way?
PostPosted: Tue Jun 23, 2009 2:10 pm 
Newbie

Joined: Tue Jun 23, 2009 1:36 pm
Posts: 2
I found some code to merge multiple indexes into one:
http://www.asteriosk.gr/blog/2009/03/31 ... e-indexes/

or just combine the files, as described here:
http://www.opensubscriber.com/message/l ... 03308.html

so, I could let each "indexing slave" make it's own index, then bring the files together and merge with above mentioned code.

the question is, which architecture (this or previous post) is the most performant?


Top
 Profile  
 
 Post subject: Re: rebuild a large index in a distributed way?
PostPosted: Fri Jun 26, 2009 4:56 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi bshf,
I've seen your situation several times and have worked since a long time on patches for Hibernate Search to rebuild the indexes in the most efficient way I could find, but did never consider the "multiple nodes" idea.
I'm doing it with threads on one single node so I don't have merging problems: I got the idea when profiling the job, it's clear that the database round-trip is the major bottleneck, so even on a dual core using something like 30 parallel threads does have sense and gives you a sensible speed-up.

The code is ready and working fine on our systems (on production), but is not yet committed as I'm still missing some unit tests (damn hard to design..)
if you would like to try it I can send you the patch, or I'll attach it to JIRA.
It requires latest trunk of both Search and dependencies (so hibernate-core 3.5-SNAPSHOT)

It would be very cool if you could try it and give me some feedback: as I said it works in our environment on our model but nobody else tested it. Would you like to beta-test it and give me feedback? I'll be glad to improve it if needed.

Sanne

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: rebuild a large index in a distributed way?
PostPosted: Mon Jun 29, 2009 6:29 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
It's committed in trunk now, no need to send patches.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.