Hi,
Quote:
Thanks for the reply and I understand the complexity with dealing with split brain in distributed environments.
Great! It's not easy to find a brainstorming partner on this subject.
Quote:
The problem with the JMS solution in our case is that the master and slave is static, and that make it impossible to promote a new master on new running node.
I'm definitely not a JMS guru but I'm told that this is not true. It might not be part of the JMS standard API, but most implementations do offer configurations which guarantee single consumers, and for others it's possible to define the consumer as an HA singleton.
I would agree though on this being a pain to setup; one way of improving this could be for us to provide some configuration examples (and test them), but since I just heard from WildFly developers that version 10 is expected to have a simplified setup for single consumers, I'm actually trying to find out more about that.
Also happy to do some R&D around JGroups and similar alternatives, but my priority would be to offload this problem to a different library as it's not the core business for Hibernate Search. Apache Kafka also seems fitting, I'm very open at discussing alternative backends to make this simpler.
Quote:
And I guess JMS together with Infinispan and distributed LuceneIndexesLocking has the same issue with possible orphan locks if the master node crash?
Right.
When I mentioned Infinispan Query I was not referring to the JMS implementation though; for the Infinispan Query I actually created a more reliable backend which has passed quite some tricky tests:
https://github.com/infinispan/infinispan/blob/4a37550e36201d7ace60f82b412ff06d3b043bfe/query/src/main/java/org/infinispan/query/indexmanager/ClusteredSwitchingBackend.java
We might want to port a similar approach back to Hibernate Search. I didn't do that yet because that model is timeout-based which does match the state handling of Infinispan itself (so it would be consistent with other state it stores, like the entries), but when having the reference data stored in a RDBMs I believe the expectations of consistency should be better than what you can get from a timeout based model. I'm referring to the limitations we get with a split-brain of course, as since it doesn't keep logs it can't perform a reconciliation on merges.
For the Hibernate Search (RDBMs) we could have this as an option, but I'd prefer to have a log based backend as well, so backporting that model hasn't been high on my priority list.
Quote:
Workaround:
Could a work around be to setup LuceneIndexesLocking as local cache?
Disabling the lock is easy. The real question is if it's correct and safe in your environment to disable locking?
To do it, your workaround is valid, but you could also:
Code:
hibernate.search.default.locking_strategy = none
[http://docs.jboss.org/hibernate/search/5.4/reference/en-US/html_single/#search-configuration-directory-lockfactories]
Quote:
Another solution : [...]
That's right, I like your idea. The solution within Infinispan Query is similar: it's slightly simpler as for that one I simply decided that master election would be the first JGroups member. JGroups guarantees the list of members (the View) is ordered the same for all nodes, and first one (aka the Coordinator) is a good choice as it will always be the oldest member of the View. This way, if the coordinator fails and the master needs to be re-elected, the next one in list is a very easy and safe election protocol, and keeps the role of the master rather stable over time.
The code in Hibernate Search attempts to do an hash based election based on the index name; the benefit would be to not pick the same master for all indexes but it also means that the role of a master node needs to be migrated from an alive node to a different alive node on (probably) every view change. Stealing a lock from a crashed node is much simpler than having to coordinate an index writer flush from an alive node and only then have it voluntarily release the lock to the other master node. The better protocol would be to apply some hashing, but only migrate master on crash, similarly to what you suggest.
Quote:
For the split brain issue I don’t have any good suggestions, but I do think using Kubernetes to solve the split brain seems overkill. I mean if you're not using Kubernetes today is rather big framework to bring into your current technology stack.
I meant Kubernetes as one example; I simply expect that almost anyone having more than one server to manage will have some script which starts/stops nodes, and if Hibernate Search could be notified (JMX?) about how many nodes are supposed to be in the group, it could use that to provide an option for strong consistency at expense of availability.
Quote:
The JGroups-RAFT looks interesting because it seem more lightweight and you already piggybacking on the JGroups framework.
That's right. Funny you mention piggybacking as that's exactly what we've been working on:
https://github.com/belaban/JGroups/blob/master/doc/design/FORK.txt
FORK is going to be exposed within WildFly 10; the RAFT component is still experimental though. I should try to find some time to play with it (it won't evolve past experimental until someone like you and me actually play with it), but also it would be nice to provide an easier solution already. Maybe I should just backport the improvements from Infinispan as an intermediate improvement, and find some JMS expert to share some configuration examples.
Ultimately we would also like to make a "master package" which could work without having the entity classes deployed, that could be a default service of WildFly and make it really easy; that's requiring several changes in the backend API and serialization formats.
Thanks a lot for all your thoughts! It's a big subject and it's motivating to know that people need this, and great to have some brainstorming about it.