LockObtainFailedException, With JGroups backend & Infinispan

fharms · **Joined:** Tue Oct 15, 2013 12:16 pm **Posts:** 10

Hey,

We are having strange problem with Hibernate Search and Lucene in a 2 node cluster. The indexes are stored in Infinispan and we have setup JGroups as a backed provider with auto-election of the master.

We know the JGroups auto-election in Hibernate Search 4 is experimental

I see 2 possible explanations, but no solution :-)

1) It trying to take lock on both nodes?
2) Infinispan take to long acquiring a lock. I can see Lucene have a default 1 sec lock wait time. Which doesn't seem long

Thanks
Flemming

Quote:

Wildfly 8.2
Infinspan 6.0.2
Hibernate Search 4.5.1
JGroups 3.4.5

Code:

2015-06-08 14:42:00,660 ERROR [org.hibernate.search.exception.impl.LogErrorHandler] (Hibernate Search: Index updates queue processor for index com.apc.config.isxdesignermodel.impl.items.CageImpl-1) HSEARCH000058: Exception occurred org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.infinispan.lucene.locking.BaseLuceneLock@1012be87
Primary Failure:
        Entity com.apc.config.isxdesignermodel.impl.items.CageImpl  Id feae40f8-2779-4d5e-a82b-0a74943adb75  Work Type  org.hibernate.search.backend.UpdateLuceneWork
Subsequent failures:
        Entity com.apc.config.isxdesignermodel.impl.items.CageImpl  Id fd37d3ec-d143-466b-993e-dd0cf1e9b20f  Work Type  org.hibernate.search.backend.UpdateLuceneWork
        Entity com.apc.config.isxdesignermodel.impl.items.CageImpl  Id f526745d-f2de-4691-8df1-fada10530db0  Work Type  org.hibernate.search.backend.UpdateLuceneWork
: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.infinispan.lucene.locking.BaseLuceneLock@1012be87
        at org.apache.lucene.store.Lock.obtain(Lock.java:84)
        at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:1098)
        at org.hibernate.search.backend.impl.lucene.IndexWriterHolder.createNewIndexWriter(IndexWriterHolder.java:146)
        at org.hibernate.search.backend.impl.lucene.IndexWriterHolder.getIndexWriter(IndexWriterHolder.java:113)
        at org.hibernate.search.backend.impl.lucene.AbstractWorkspaceImpl.getIndexWriter(AbstractWorkspaceImpl.java:117)
        at org.hibernate.search.backend.impl.lucene.SharedIndexWorkspaceImpl.getIndexWriter(SharedIndexWorkspaceImpl.java:81)
        at org.hibernate.search.backend.impl.lucene.LuceneBackendQueueTask.applyUpdates(LuceneBackendQueueTask.java:101)
        at org.hibernate.search.backend.impl.lucene.LuceneBackendQueueTask.run(LuceneBackendQueueTask.java:67)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_45]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_45]
        at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_45]

Code:

 <cache-container name="hibernate-search" start="EAGER" jndi-name="java:jboss/infinispan/container/hibernate-search">
                <transport lock-timeout="330000"/>
                <replicated-cache name="LuceneIndexesMetadata" start="EAGER" mode="SYNC" remote-timeout="330000">
                    <locking isolation="READ_COMMITTED" striping="false" acquire-timeout="330000" concurrency-level="500"/>
                    <transaction mode="NONE"/>
                    <eviction strategy="NONE" max-entries="-1"/>
                    <expiration max-idle="-1"/>
                    <state-transfer enabled="true" timeout="480000"/>
                    <string-keyed-jdbc-store preload="true" shared="true"  passivation="false" fetch-state="true" datasource="java:jboss/datasources/PostgresDS" dialect="POSTGRES">
                        <write-behind />
                        <property name="key2StringMapper">
                            org.infinispan.lucene.LuceneKey2StringMapper
                        </property>
                        <string-keyed-table>
                            <id-column name="ID_COLUMN" type="VARCHAR(255)"/>
                            <data-column name="DATA_COLUMN" type="bytea"/>
                            <timestamp-column name="TIMESTAMP_COLUMN" type="BIGINT"/>
                        </string-keyed-table>
                    </string-keyed-jdbc-store>
                    <indexing index="NONE"/>
                </replicated-cache>
                
                <replicated-cache name="LuceneIndexesData" start="EAGER" mode="SYNC" remote-timeout="25000">
                    
                    <locking isolation="READ_COMMITTED" striping="false" acquire-timeout="330000" concurrency-level="500"/>
                    <transaction mode="NONE"/>
                    <eviction strategy="NONE" max-entries="-1"/>
                    <expiration max-idle="-1"/>
                    <state-transfer enabled="true" timeout="480000"/>
                    <string-keyed-jdbc-store preload="true" shared="true" passivation="false" fetch-state="true" datasource="java:jboss/datasources/PostgresDS" dialect="POSTGRES">
                        <write-behind />
                        <property name="key2StringMapper">
                            org.infinispan.lucene.LuceneKey2StringMapper
                        </property>
                        <string-keyed-table>
                            <id-column name="ID_COLUMN" type="VARCHAR(255)"/>
                            <data-column name="DATA_COLUMN" type="bytea"/>
                            <timestamp-column name="TIMESTAMP_COLUMN" type="BIGINT"/>
                        </string-keyed-table>
                    </string-keyed-jdbc-store>
                    <indexing index="NONE"/>
                </replicated-cache>

                <replicated-cache name="LuceneIndexesLocking" start="EAGER" mode="SYNC" remote-timeout="25000">
                    <locking isolation="READ_COMMITTED" striping="false" acquire-timeout="330000" concurrency-level="500"/>
                    <transaction mode="NONE"/>
                    <eviction strategy="NONE" max-entries="-1"/>
                    <expiration max-idle="-1"/>
                    <state-transfer enabled="true" timeout="480000"/>
                    <indexing index="NONE"/>
                </replicated-cache>
                
            </cache-container>

<?xml version="1.0" encoding="UTF-8"?>
<persistence xmlns="http://java.sun.com/xml/ns/persistence"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://java.sun.com/xml/ns/persistence http://java.sun.com/xml/ns/persistence/persistence_2_0.xsd"
             version="2.0">
    <persistence-unit name="hibersyncDataSource-server" transaction-type="JTA">
        <jta-data-source>java:jboss/datasources/PostgresDS</jta-data-source>
        <properties>
           ....
            <!-- /// Hibernate Search configuration follows /// -->

            <!-- Enable cluster-replicated index, but the default configuration does not enable any form of permanent persistence for the index -->
            <property name="hibernate.search.default.directory_provider" value="infinispan"/>

            <!-- Use CacheManager defined in WildFly configuration file, e.g., standalone.xml -->
            <property name="hibernate.search.infinispan.cachemanager_jndiname" value="java:jboss/infinispan/container/hibernate-search"/>

            <!-- The default TransactionalWorker wrapped by com.apc.config.hibersync.server.services.IServerServiceLocator.[startCurrentRequest|endCurrentRequest] -->
            <property name="hibernate.search.worker.scope" value="com.apc.config.hibersync.server.hsearch.backed.InRequestContextTransactionalWorker"/>

            <!-- Prepare and apply the index changes asynchronously from within a new thread -->
            <property name="hibernate.search.default.worker.execution" value="async"/>

            <!-- Automatic master election via JGroups, requires Infinispan directory provider -->
            <property name="hibernate.search.default.worker.backend" value="jgroups"/>

            <!-- TCP based stack, with flush, flow control and message bundling -->
            <property name="hibernate.search.services.jgroups.configurationFile" value="flush-tcp.xml"/>

            <!-- Avoid LockObtainFailedException when a new master is elected: https://forum.hibernate.org/viewtopic.php?f=9&t=1035437#p2484259 -->
            <property name="hibernate.search.default.exclusive_index_use" value="false"/>

            <!-- Memory dedicated to document buffers -->
            <property name="hibernate.search.default.indexwriter.ram_buffer_size" value="256"/>
            
            <!-- Write indexes in Infinispan -->
             <property name="hibernate.search.default.chunk_size" value="128000" />

            <!-- The default is 10, but we don't want to waste many cycles in merging
            (tune for writes at cost of reader fragmentation) -->
            <property name="hibernate.search.default.indexwriter.merge_factor" value="30" />

            <!-- Never create segments larger than 1GB -->
            <property name="hibernate.search.default.indexwriter.merge_max_size" value="1024" />
            <!-- ClassicAnalyzer was named StandardAnalyzer in Lucene versions prior to 3.1 -->
            <property name="hibernate.search.analyzer" value="org.apache.lucene.analysis.standard.ClassicAnalyzer"/>

            <!-- ^^^ End of Hibernate Search configuration ^^^ -->

        </properties>
    </persistence-unit>
</persistence>

fharms · **Joined:** Tue Oct 15, 2013 12:16 pm **Posts:** 10

Update on this:

I did some more testing and found out if I remove the

Code:

<locking isolation="READ_COMMITTED" striping="false" acquire-timeout="330000" concurrency-level="500"/>

from the cache I don't see the problem

Could it be related to this jira https://hibernate.atlassian.net/browse/HSEARCH-893 ?

fharms · **Joined:** Tue Oct 15, 2013 12:16 pm **Posts:** 10

Update :

After digging into the code and debugging I found out, if you are running in a cluster with the following setup JGroups (auto-election), Infinispan(replication) and exclusive_index_use(false) and crash the node there is indexing, there is a chance the IndexWriter lock is never cleaned up in the LuceneIndexesLocking cache and the remaining node will fail with the exception :

Quote:

“Exception occurred org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: org.infinispan.lucene.locking.BaseLuceneLock@1012be87”

The problem seems to be in the “LuceneBackendQueueTask” where getIndexWriter return null because it’s cannot obtain lock and it’s never released

Code:

IndexWriter indexWriter = workspace.getIndexWriter( errorContextBuilder );
if ( indexWriter == null ) {
     log.cannotOpenIndexWriterCausePreviousError();
     return;
}

Does it make sense?

sanne.grinovero · **Posted:** Tue Jun 16, 2015 8:36 pm

Hi,
yes that auto-selection code is highly experimental and I wouldn't suggest using it as-is; it's included in the project as a "power users" oriented extension point.
The automatic selection concept was a feature needed in CapeDwarf (which has a storage engine based on Hibernate Search), but CD actually injects a custom Service implementation instance for org.hibernate.search.backend.jgroups.impl.NodeSelectorService.
I'm referring to Service as literally a org.hibernate.search.engine.service.spi.Service : these can be provided by the bootstrapping code as instance and replace internal services.

Cleaning up the lock from a crashed node is a tricky operation when using the JGroups backend: there is no facility for detection of split-brain scenarios or tell the difference between fully crashed or nodes being temporarily unresponsive (say a long GC pause), so there is the risk of clearing the lock from a node which is actually going to use it. I'm ashamed that this old experiment didn't make further progress yet :-/
The general, more reliable idea is to move on JGroups-RAFT for election and require a minimum, configurable quorum of members. The more robust solution, is like with CapeDwarf to inject an implementation of the NodeSelectorService which is connected with your cluster management tooling. For example Kubernetes would know exactly how many nodes are supposed to be available, and can perform a reliable master choice which doesn't rely on problematic auto-detection protocols.. I didn't try doing it but it should be trivial to inject an adapter which delegates calls to the Kubernetes controller and implements the Search interface to drive it.

Infinispan Query actually includes code which does perform auto-election. Also in that case I would prefer using externally injected facts, but at least it can perform a decent election which is consistent with Infinispan's own cluster topology. That code in Infinispan is the evolution of this pattern, however in that case I gave up on the hashing of the index as it makes the failover protocol much more complex, and isn't actually useful for the Infinispan case as it's planning to distribute the indexing load in a different way.

Please, let me know if this is clear enough. I'm happy to go into more details if you plan to revive this subject: usually people stick with the JMS backend but I'd like to improve this one if there is interest.

fharms · **Joined:** Tue Oct 15, 2013 12:16 pm **Posts:** 10

Hi Sanne

Thanks for the reply and I understand the complexity with dealing with split brain in distributed environments.

The problem with the JMS solution in our case is that the master and slave is static, and that make it impossible to promote a new master on new running node. That's way the JGroups auto-election was a good choose for us. And I guess JMS together with Infinispan and distributed LuceneIndexesLocking has the same issue with possible orphan locks if the master node crash?

Workaround :

Could a work around be to setup LuceneIndexesLocking as local cache? and If the back end worker is always guarantee only running on one node there is no need to distribute the locks?

This could solve the orphan lock situation, because if node A crash the node B as no knowledge about previous node A locks. Of course it doesn't solve the split brain issue.

Another solution :

Another solution could be and I’m just thinking out loud here. If we assume the setup is JGroups(auto-election) together with Infinispan and the caches is setup as replicated/distributed.

If we extend the FileCacheKey with information which node has taken the lock we should be able together with suspect view and view viewAccepted notification determine if a lock should be removed when a new master node is selected.

The AutoNodeSelector is extend to hold information about the view members, this give us the information how the view look before a node crash. If the view change because of a crash and a new master is selected, we can use the new view and the old view to determine which node has been removed and remove any locks remaining in LuceneIndexesLocking cache

For new nodes joining the cluster will just join the totalview members in the AutoNodeSelector.

Of course this solution does not solve the split brain, but I was think about something in between where is does not prevent the IndexWriter to worker just because the node crash while it still had a lock.

For the split brain issue I don’t have any good suggestions, but I do think using Kubernetes to solve the split brain seems overkill. I mean if you're not using Kubernetes today is rather big framework to bring into your current technology stack.

The JGroups-RAFT looks interesting because it seem more lightweight and you already piggybacking on the JGroups framework.

/Flemming

sanne.grinovero · **Posted:** Tue Jun 23, 2015 8:25 am

Hi,

Quote:

Thanks for the reply and I understand the complexity with dealing with split brain in distributed environments.

Great! It's not easy to find a brainstorming partner on this subject.

Quote:

The problem with the JMS solution in our case is that the master and slave is static, and that make it impossible to promote a new master on new running node.

I'm definitely not a JMS guru but I'm told that this is not true. It might not be part of the JMS standard API, but most implementations do offer configurations which guarantee single consumers, and for others it's possible to define the consumer as an HA singleton.

I would agree though on this being a pain to setup; one way of improving this could be for us to provide some configuration examples (and test them), but since I just heard from WildFly developers that version 10 is expected to have a simplified setup for single consumers, I'm actually trying to find out more about that.
Also happy to do some R&D around JGroups and similar alternatives, but my priority would be to offload this problem to a different library as it's not the core business for Hibernate Search. Apache Kafka also seems fitting, I'm very open at discussing alternative backends to make this simpler.

Quote:

And I guess JMS together with Infinispan and distributed LuceneIndexesLocking has the same issue with possible orphan locks if the master node crash?

Right.
When I mentioned Infinispan Query I was not referring to the JMS implementation though; for the Infinispan Query I actually created a more reliable backend which has passed quite some tricky tests:
https://github.com/infinispan/infinispan/blob/4a37550e36201d7ace60f82b412ff06d3b043bfe/query/src/main/java/org/infinispan/query/indexmanager/ClusteredSwitchingBackend.java

We might want to port a similar approach back to Hibernate Search. I didn't do that yet because that model is timeout-based which does match the state handling of Infinispan itself (so it would be consistent with other state it stores, like the entries), but when having the reference data stored in a RDBMs I believe the expectations of consistency should be better than what you can get from a timeout based model. I'm referring to the limitations we get with a split-brain of course, as since it doesn't keep logs it can't perform a reconciliation on merges.
For the Hibernate Search (RDBMs) we could have this as an option, but I'd prefer to have a log based backend as well, so backporting that model hasn't been high on my priority list.

Quote:

Workaround:
Could a work around be to setup LuceneIndexesLocking as local cache?

Disabling the lock is easy. The real question is if it's correct and safe in your environment to disable locking?

To do it, your workaround is valid, but you could also:

Code:

hibernate.search.default.locking_strategy = none

[http://docs.jboss.org/hibernate/search/5.4/reference/en-US/html_single/#search-configuration-directory-lockfactories]

Quote:

Another solution : [...]

That's right, I like your idea. The solution within Infinispan Query is similar: it's slightly simpler as for that one I simply decided that master election would be the first JGroups member. JGroups guarantees the list of members (the View) is ordered the same for all nodes, and first one (aka the Coordinator) is a good choice as it will always be the oldest member of the View. This way, if the coordinator fails and the master needs to be re-elected, the next one in list is a very easy and safe election protocol, and keeps the role of the master rather stable over time.

The code in Hibernate Search attempts to do an hash based election based on the index name; the benefit would be to not pick the same master for all indexes but it also means that the role of a master node needs to be migrated from an alive node to a different alive node on (probably) every view change. Stealing a lock from a crashed node is much simpler than having to coordinate an index writer flush from an alive node and only then have it voluntarily release the lock to the other master node. The better protocol would be to apply some hashing, but only migrate master on crash, similarly to what you suggest.

Quote:

For the split brain issue I don’t have any good suggestions, but I do think using Kubernetes to solve the split brain seems overkill. I mean if you're not using Kubernetes today is rather big framework to bring into your current technology stack.

I meant Kubernetes as one example; I simply expect that almost anyone having more than one server to manage will have some script which starts/stops nodes, and if Hibernate Search could be notified (JMX?) about how many nodes are supposed to be in the group, it could use that to provide an option for strong consistency at expense of availability.

Quote:

The JGroups-RAFT looks interesting because it seem more lightweight and you already piggybacking on the JGroups framework.

That's right. Funny you mention piggybacking as that's exactly what we've been working on:
https://github.com/belaban/JGroups/blob/master/doc/design/FORK.txt
FORK is going to be exposed within WildFly 10; the RAFT component is still experimental though. I should try to find some time to play with it (it won't evolve past experimental until someone like you and me actually play with it), but also it would be nice to provide an easier solution already. Maybe I should just backport the improvements from Infinispan as an intermediate improvement, and find some JMS expert to share some configuration examples.

Ultimately we would also like to make a "master package" which could work without having the entity classes deployed, that could be a default service of WildFly and make it really easy; that's requiring several changes in the backend API and serialization formats.

Thanks a lot for all your thoughts! It's a big subject and it's motivating to know that people need this, and great to have some brainstorming about it.

fharms · **Joined:** Tue Oct 15, 2013 12:16 pm **Posts:** 10

Hi Sanne,

Thanks for the reply and I will do what I can to contribute to the subject :-)

Quote:

I'm definitely not a JMS guru but I'm told that this is not true. It might not be part of the JMS standard API, but most implementations do offer configurations which guarantee single consumers, and for others it's possible to define the consumer as an HA singleton.

Neither Am I :-) and I might have interpreted the documentation wrong or we both talk about something different. It’s not the JMS that has the limitation IMO, but is in Hibernate Search.

Quote:

The master is the sole responsible for updating the Lucene index. The slaves can accept read as well as write operations. However, they only process the read operation on their local index copy and delegate the update operations to the master.

https://docs.jboss.org/hibernate/search/5.4/reference/en-US/html_single/#search-architecture-jms

The documentation explain you will have to create two different configurations. One for the master and one for the slave and since they are loaded on deployment on each node. It’s hard to promote a new master node without changing the persistence.xml / hibernate.cfg.xml and redeploy.

I might have misunderstood this, so I would be more than happy to help on improving this with additional examples for others as you suggested?

Quote:

To do it, your workaround is valid, but you could also:

Code:

hibernate.search.default.locking_strategy = none

I tried your suggestion but I still see it taking a lock when using Infinispan as directory. I did poke around in the code and the hibernate.search.default.locking_strategy seems only valid for the FSDirectory and RAMDirectory?

We saw an issue where the node was still keeping it’s lock even with the “hibernate.search.default.locking_strategy = none” when we was testing HA.

This is what happen in a 2 node cluster.

Using Infinispan as directory and sync cache store to persist the index to the database and exclusive lock false. Node 1 was taking a lock in the local cache and tried to replicate its data to node 2, this failed with an error “Error while storing string key to database;” due to a database problem.

Now this lead to the lock on node 1 was never released. I guess this is another scenario where you can end up with stale locks though I have only chosen to use locking_strategy = none

Quote:

That's right, I like your idea. The solution within Infinispan Query is similar….

I took a quick look at the code not that I understand it 100% yet, but I get the idea on how you can switch new backends if the view change seems pretty nice! Yes It could be interesting to backport back to Hibernate Search, and I guess you want to add this to the list of current backend workers or do you want to extend the JGroup backend ?

How do you handle the lock the IndexWriter want to create, is it local? and are they cleaned up when the specific backend is closed?

Anyway I would be interested in helping with this!

Quote:

the RAFT component is still experimental though.

But still, it would be a pretty nice feature to have in Hibernate Search because it address the problems around network partitions etc. Is the JGroup RAFT in a shape where we can do a little R&D?

Thanks!

/Flemming