-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 9 posts ] 
Author Message
 Post subject: advice on alternative master-slave deployment
PostPosted: Wed Nov 24, 2010 4:46 pm 
Beginner
Beginner

Joined: Thu Nov 20, 2003 10:16 pm
Posts: 28
Location: Los Angeles, CA
My project manager has requested a trimmed-down master/slave deployment without the JMS component (or any mode of communication between master and slave). The organization is worried about 'too many moving parts' and possible points of failure. We have mutually come up with the following alternative design:

    1. As before, the slave would have a read-only local copy of the index, but it would no longer attempt to post index-update messages on any JMS queue.
    2. A database trigger would be set up on appropriate tables to set a modified-date timestamp on a row when an update to that row is made by the slave (or any external process, which there is one).
    3. A separate master process would wake up on a cron-schedule, get all the indexable Hibernate entities that have been modified since its last run-time and reindex them. It would not attempt to connect to any JMS queue.
    4. The master process would switch its master copy to its source copy, as before, at the end of a given run.
    5. A separate rsync job would be set up to periodically copy the master source index to a source location local to the slave, to avoid the slave having to make an network hop when refreshing its local index.

What would be the easiest way to implement this approach (i.e., what classes to reuse/sublcass?). Any potential pitfalls?

Thanks in advance,

Michael


Top
 Profile  
 
 Post subject: Re: advice on alternative master-slave deployment
PostPosted: Thu Nov 25, 2010 5:23 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Hi Michael,

I don't think you solution has less movable parts, but just different ones. Nevertheless, here are some tips/ideas which comes to my mind. I haven't tried this yet so you still need to do your own investigation.
Let's start with the master. You could just use the existing FSMasterDirectoryProvider. This directory provider is independent of JMS. The JMS part is handled via a MessageListener which is often a subclass of AbstractJMSHibernateSearchController. Instead of implementing a message listener you just implement your own indexing logic. Note though that your step 4 is not really what happens today. Index switching does not occur after an index run, but rather after a refresh period. If you want to explicitly trigger the switch you need to look FSMasterDirectoryProvider and do something similar just without a timer. Just add a method to trigger the index switch. You can get hold of the directory provider in your index code via the SeachFactory.getDirectoryProviders method. There might be even a way to extend FSMasterDirectoryProvider.
On the slave side you can use FSSlaveDirectoryProvider. Again no connection to JMS. Instead of the jms backend use the default lucene one. Set the property hibernate.search.indexing_strategy to manual manual. This way there is no automatic indexing which is pretty much what you want.
You rsync job for copying you probably need a whole script, since you will have to figure out which directory to copy. You will have to look for the marker files and act accordingly. That's a little bit of a weak link here.
Well, I hope this gives you a few ideas of the flexibility of the Search configuration. Keep in mind some network communication will always be necessary if you have multiple machines. If you don't like JMS, have you considered the JGroups alternative? I think it fits your requirements well and all JGroups is there for is to distribute the indexes. In this case you can skip the database triggers which are another unnecessary "moving" part in my eyes.

--Hardy


Top
 Profile  
 
 Post subject: Re: advice on alternative master-slave deployment
PostPosted: Thu Nov 25, 2010 5:27 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Not to mention that there is also a Infinispan clustering solution which you could try.


Top
 Profile  
 
 Post subject: Re: advice on alternative master-slave deployment
PostPosted: Fri Nov 26, 2010 4:03 pm 
Beginner
Beginner

Joined: Thu Nov 20, 2003 10:16 pm
Posts: 28
Location: Los Angeles, CA
Hi Hardy,

Thanks for your detailed reply.

I agree that this design might really be a trade-off of 'moving parts'. The issue is partly one of available in-house expertise. The infrastructure team at this organization doesn't seem to have much experience maintaining messaging systems, and has even less knowledge of JGroups. However, they do have a database team that's proficient with Oracle. Also, it seems like an Oracle database trigger that simply modifies a timestamp in a column should be less likely to fail than a long-running ActiveMQ process.

Just to give a little more background, in this project, we have a webservice performing the search and also possibly updating some of the data. However, there is also a separate hourly process that runs to sync data over directly to the db from another external datasource. This process is some script written and maintained by the db team and totally independent of the webservice. It seems like the trigger would be a good solution in that it could detect modifications to the underlying data, no matter where that update comes from. Also, it's important to note that the size of the dataset is fairly small (about 5000 rows), and the data is not anticipated to change significantly from hour to hour (on the order of 10 rows at the most).

So given that, maybe even this design is overkill. What would you think about just having a timer task running inside the webservice itself that periodically sweeps through the data and reindexes modified entities? Any disadvantages to that, other than it slowing down the webservice during the task execution?

Thanks for your input!

Michael


Top
 Profile  
 
 Post subject: Re: advice on alternative master-slave deployment
PostPosted: Sat Nov 27, 2010 2:00 pm 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Hi,

if you have processes making directly changes to the databases it makes of course sense to have some sort of last modified field to determine how many rows/entities have changed.

Not sure whether you are saying your whole dataset is 5000 or the changeset, either way it seems like such a small number that having a timer in the webservice itself seems the easiest solution. you could start this way and later build in a clustering solution.

--Hardy


Top
 Profile  
 
 Post subject: Re: advice on alternative master-slave deployment
PostPosted: Sat Nov 27, 2010 2:50 pm 
Beginner
Beginner

Joined: Thu Nov 20, 2003 10:16 pm
Posts: 28
Location: Los Angeles, CA
I'm saying the whole dataset is 5000 rows. Thanks, Hardy.


Top
 Profile  
 
 Post subject: Re: advice on alternative master-slave deployment
PostPosted: Mon Nov 29, 2010 4:30 pm 
Beginner
Beginner

Joined: Thu Nov 20, 2003 10:16 pm
Posts: 28
Location: Los Angeles, CA
Hardy, on your comment about the rsync job being a weak link. I had envisioned it to simply copy the remote source to the local source, so why would a script be necessary? The slave would then perform its usual logic when refreshing from the local source.


Top
 Profile  
 
 Post subject: Re: advice on alternative master-slave deployment
PostPosted: Tue Nov 30, 2010 5:42 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
If you have a look at a existing master/slave setup or if you look at the code for FSMasterDirectoryProvider you will see that there is not only one index/directory. The master as well as the slave create two index directories underneath the directories you specify in the configuration. One directory/index is the current one, whereas the other one is used for copying. When the master updates the master index he does that in the non current one. If the update completes successfully the master will switch the current marker file.
This is needed so that index searches are possible while the sync is on progress. For your rsync job to work properly you probably need to take this into account. For example, how to you sync the index switching with the rsync job. You don't want to run rsync while the master is in the process of switching the indexes. In fact the safest option would be to trigger the rsync job from within directory provider.
Do you see now where I am getting at?

--Hardy


Top
 Profile  
 
 Post subject: Re: advice on alternative master-slave deployment
PostPosted: Tue Nov 30, 2010 2:29 pm 
Beginner
Beginner

Joined: Thu Nov 20, 2003 10:16 pm
Posts: 28
Location: Los Angeles, CA
I see your point now, and it would make sense to trigger it from within the directory provider right after a switch.


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 9 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.