Hibernate Search JMS based clustering approach?

jadler · **Joined:** Tue Oct 30, 2012 10:44 am **Posts:** 2

I am trying to implement a clustered environment using Hibernate Search 3.4.2. I have a requirement for real time updating of indexes (and shared file system is not a solution for us). I attempted to install Infinispan however I had all sorts of trouble getting it to work. This might be related to the versions of Hibernate/Hibernate Search/Infinispan (hibernate 3.6.10 / hibernate-search 3.4.2.Final/ hibernate-search-infinispan 3.4.2.Final / infinispan-core 4.2.1.FINAL / ) I tried using, but I need to stick with these versions for the time being.

So I thought why not try configuring every node as follows:

1. with an embedded JMS server (HornetQ in my case), set up with durable queues in a cluster
2. a JMS slave backend (hibernate.search.worker.backend=jms)
3. a JMS MessageListener, to process messages from all nodes (including the producer node, who may have generated the message via JMS slave backend)
4. The default filesystem directory (as all nodes will keep mirrored indexes that are updated simultaneously)

In other words, all nodes publish messages to the queue and all nodes subscribe to messages on the queue.

The problem I have is that I cannot seem to access the LuceneBackendQueueProcessor in the JMS MessageListener in order to update the filesystem, because I have already specified a JMS slave backend. Can anyone see a way around my issue or offer a better approach?

Kind Regards,
Jonathon.

sanne.grinovero · **Posted:** Tue Oct 30, 2012 3:04 pm

Hi,
good idea. All you need to do is to create a BackendQueueProcessor which wraps both the Lucene one and the JMS implementations, and have each method implemented by forwarding to both. You can then plug this in easily specifying the fully qualified class name instead of just "jms".

The tricky part is then to make sure:
- you don't send messages to yourself
- you don't forward again messages received from other nodes

If you want you could contribute that and we'll include the concept in the next release, but adapting it of course to latest version.

Regarding Infinispan issues, yes it's likely that you have an old version and that was one of the first versions available.. still I'd like to know what the problem is if you can provide some more details? Good to make sure it was indeed solved in the more recent versions.

jadler · **Joined:** Tue Oct 30, 2012 10:44 am **Posts:** 2

sanne.grinovero wrote:

Hi,
good idea. All you need to do is to create a BackendQueueProcessor which wraps both the Lucene one and the JMS implementations, and have each method implemented by forwarding to both. You can then plug this in easily specifying the fully qualified class name instead of just "jms".

Great idea Sanne. That was the pointer I needed! I have now fleshed something out that seems to do the job nicely. It was actually pretty easy once I started. :)

sanne.grinovero wrote:

The tricky part is then to make sure:
- you don't send messages to yourself
- you don't forward again messages received from other nodes

Spot on. Was pretty easy in the end.

sanne.grinovero wrote:

If you want you could contribute that and we'll include the concept in the next release, but adapting it of course to latest version.

I'll put something together shortly. How would you like it? My code is coupled to my framework and you will want it in a more pure form.

sanne.grinovero wrote:

Regarding Infinispan issues, yes it's likely that you have an old version and that was one of the first versions available.. still I'd like to know what the problem is if you can provide some more details? Good to make sure it was indeed solved in the more recent versions.

To start with I had the wrong version of jGroups which really mucked thing up... but even once I got the configuration worked out for jGroups (we needed TCPPING as the network admins didn't like the sounds of the default multicast option) I had issues. I couldn't get the first node up in the cluster (presumably the coordinator) to be able to leave the group and come back with all the data intact (even if another node was still running with all the data replicated). Maybe this would be solved with a store on the master and using a JMS backed as suggested (to solve the 1 lucene writer issue and JMS ensuring no data loss). But then I thought, why use 2 technologies (with 2x the admin and config) when 1 can do it. If HS/infinispan was able to take care of the writer across the cluster automatically (meaning no JMS required) and self heal regardless of which node goes down (obviously only possible when replication is happening), then it would be a great solution. Ie. a great offering compared to Elastic Search. Basically what I wanted was a simple to configure clustering option (like elastic search) but with the automatic indexing that Hibernate Search provides. Is it there yet in a later version? Please don't take this as a harsh criticism of your great work, just letting you know my experiences...

Thanks again for you help. I will report back after testing...

Cheers,
Jonathon Adler.

sanne.grinovero · **Posted:** Thu Nov 01, 2012 8:23 am

Quote:

It was actually pretty easy once I started. :)

Nice!

Quote:

I'll put something together shortly. How would you like it? My code is coupled to my framework and you will want it in a more pure form.

Why is your code coupled to your framework? Do you think you could make it "clean"?

First thing is you should open a new JIRA [1], both to track your cool idea and the changes which are going to happen; you can open one as "new feature".
Then you can either attach a patch file on the JIRA issue, or send a pull request on GitHub; we generally prefer patches formatted consistently with the Hibernate style, having unit tests and a short description in the documentation (which is in source control as well under the _hibernate-search-documentation_ Maven module). But all of this is optional, if you can't do something we can help out: better to receive a half baked patch than nothing, but in that case we might reject it or take more time to be able to look at it.
Of course you can ask for advise or better instruction about anything, either here or better on the hibernate-dev mailing list [2] or IRC chat [3]

1 - https://hibernate.onjira.com/browse/HSEARCH
2 - http://www.hibernate.org/community/mailinglists
3 - http://www.hibernate.org/community/irc

Quote:

To start with I had the wrong version of jGroups which really mucked thing up... but even once I got the configuration worked out for jGroups (we needed TCPPING as the network admins didn't like the sounds of the default multicast option) I had issues. I couldn't get the first node up in the cluster (presumably the coordinator) to be able to leave the group and come back with all the data intact (even if another node was still running with all the data replicated). Maybe this would be solved with a store on the master and using a JMS backed as suggested (to solve the 1 lucene writer issue and JMS ensuring no data loss). But then I thought, why use 2 technologies (with 2x the admin and config) when 1 can do it. If HS/infinispan was able to take care of the writer across the cluster automatically (meaning no JMS required) and self heal regardless of which node goes down (obviously only possible when replication is happening), then it would be a great solution. Ie. a great offering compared to Elastic Search. Basically what I wanted was a simple to configure clustering option (like elastic search) but with the automatic indexing that Hibernate Search provides. Is it there yet in a later version? Please don't take this as a harsh criticism of your great work, just letting you know my experiences...

Very appreciated feedback! Yes we have several ideas to improve on that, I actually have a branch which is using just Infinispan to do both command forwarding and storage, and requires just a single configuration property: "indexmanager = infinispan". Master election is automatic and differentiated per index/shard .. all looking good but couldn't merge it yet because of open issues in Infinispan. If they get fixed quickly we'll have that in Hibernate Search 4.2, but it seems like we'll have to move on with the release without this.

I personally highly prefer embedded Lucene as it's more flexible and can get better performance, but we're not against contributions to integrate remote services like Elastic Search or Solr, we just need someone motivated enough to help on that. So if you think your project would strongly benefit from such an integration, or know someone wanting it, we can work together and make that.

sanne.grinovero · **Posted:** Thu Nov 01, 2012 8:26 am

forgot the reference to the new IndexManager code:
https://hibernate.onjira.com/browse/HSEARCH-882
https://github.com/Sanne/hibernate-search/commits/HSEARCH-882-thinkpad

doesn't support automatic master failover yet because of
https://issues.jboss.org/browse/ISPN-2435

LuceneUser · **Joined:** Mon Jun 10, 2013 4:31 pm **Posts:** 2

Could you share some configuration, please? I'm intrested mainly in what directory provider you chosed, as filesystem-slave has not worked for me.