Hibernate Search w/Infinispan in distribution mode

astralbodies · **Posted:** Tue Nov 08, 2011 4:49 pm

We're still working on our performance issues that I've mentioned before in this forum. Part of that is to attempt to go to a distribution model rather than a replication model. We're running now 4 JBoss instances on each of our three servers, with max JVM heap set to 5GB. This is instead of running a single JBoss with a max of 20GB.

We have a guy in here from Redhat helping us with tuning including our JBoss instances and to help us get distribution working. I wanted to post our configuration here since there is no example distribution config in the distro JAR. We're having problems with locking in Infinispan (we think) since when we mass index, the cluster blows up and then eventually merges back into the proper view. We haven't been able to get to a point where we can try the JMS message consumption for throughput testing.

Here is our config:

Code:

<?xml version="1.0" encoding="UTF-8"?>
<infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="urn:infinispan:config:4.2 http://www.infinispan.org/schemas/infinispan-config-4.2.xsd"
   xmlns="urn:infinispan:config:4.2">

   <!-- *************************** -->
   <!-- System-wide global settings -->
   <!-- *************************** -->

   <global>

        <evictionScheduledExecutor factory="org.infinispan.executors.DefaultScheduledExecutorFactory">
            <properties>
                <property name="threadNamePrefix" value="EvictionThread"/>
            </properties>
        </evictionScheduledExecutor>
        <replicationQueueScheduledExecutor factory="org.infinispan.executors.DefaultScheduledExecutorFactory">
            <properties>
                <property name="threadNamePrefix" value="ReplicationQueueThread"/>
            </properties>
        </replicationQueueScheduledExecutor>
      <!-- Duplicate domains are allowed so that multiple deployments with default 
         configuration of Hibernate Search applications work - if possible it would 
         be better to use JNDI to share the CacheManager across applications -->
      <globalJmxStatistics enabled="true"
         cacheManagerName="HibernateSearch" allowDuplicateDomains="true"
         mBeanServerLookup="org.infinispan.jmx.JBossMBeanServerLookup" />

      <!-- If the transport is omitted, there is no way to create distributed 
         or clustered caches. There is no added cost to defining a transport but not 
         creating a cache that uses one, since the transport is created and initialized 
         lazily. -->
      <transport
         clusterName="${mpsearch.infinispan.cluster.prefix}-HibernateSearch-Infinispan-Cluster"
         distributedSyncTimeout="50000">
         <!-- Note that the JGroups transport uses sensible defaults if no configuration 
            property is defined. See the JGroupsTransport javadocs for more flags -->
         <properties>
            <!-- TODO: Change to udp.xml once streaming transfer requirement has 
               been removed. -->
            <property name="configurationFile" value="jgroups-udp.xml" />
         </properties>
      </transport>

      <!-- Used to register JVM shutdown hooks. hookBehavior: DEFAULT, REGISTER, 
         DONT_REGISTER. Hibernate Search takes care to stop the CacheManager so registering 
         is not needed -->
      <shutdown hookBehavior="DEFAULT" />

   </global>

   <!-- *************************** -->
   <!-- Default "template" settings -->
   <!-- *************************** -->

   <default>

        <transaction transactionManagerLookupClass="org.infinispan.transaction.lookup.GenericTransactionManagerLookup" 
        syncRollbackPhase="false" syncCommitPhase="false" useEagerLocking="false"/>
        
      <locking lockAcquisitionTimeout="20000" writeSkewCheck="false"
         concurrencyLevel="500" useLockStriping="false" />

      <lazyDeserialization enabled="false" />

      <!-- Invocation batching is required for use with the Lucene Directory -->
      <invocationBatching enabled="true" />
      <indexing enabled="true" indexLocalOnly="true"/>

      <!-- This element specifies that the cache is clustered. modes supported: 
         distribution (d), replication (r) or invalidation (i). Don't use invalidation 
         to store Lucene indexes (as with Hibernate Search DirectoryProvider). Replication 
         is recommended for best performance of Lucene indexes, but make sure you 
         have enough memory to store the index in your heap. Also distribution scales 
         much better than replication on high number of nodes in the cluster. -->
      <clustering mode="distribution">
          <l1 enabled="true" lifespan="60000"/>
            <hash numOwners="2" rehashRpcTimeout="120000"/>
         <!-- Network calls are synchronous by default -->
         <sync />
      </clustering>

      <jmxStatistics enabled="true" />

      <eviction maxEntries="-1" strategy="NONE" />

      <expiration maxIdle="-1" />

   </default>

   <!-- ******************************************************************************* -->
   <!-- Individually configured "named" caches. -->
   <!-- -->
   <!-- While default configuration happens to be fine with similar settings 
      across the -->
   <!-- three caches, they should generally be different in a production environment. -->
   <!-- -->
   <!-- Current settings could easily lead to OutOfMemory exception as a CacheStore -->
   <!-- should be enabled, and maybe distribution is desired. -->
   <!-- ******************************************************************************* -->

   <!-- *************************************** -->
   <!-- Cache to store Lucene's file metadata -->
   <!-- *************************************** -->
   <namedCache name="LuceneIndexesMetadata">
      <clustering mode="distribution">
          <l1 enabled="true" lifespan="60000"/>
            <hash numOwners="2" rehashRpcTimeout="120000" />
         <sync />
      </clustering>

      <loaders>
         <loader class="org.infinispan.loaders.file.FileCacheStore">
            <properties>
               <property name="location" value="${mpsearch.infinispan.search.passivation.dir}" />
            </properties>
         </loader>
      </loaders>
   </namedCache>

   <!-- **************************** -->
   <!-- Cache to store Lucene data -->
   <!-- **************************** -->
   <namedCache name="LuceneIndexesData">
      <eviction wakeUpInterval="5000" maxEntries="5000" strategy="LIRS" />

      <clustering mode="distribution">
          <l1 enabled="true" lifespan="60000"/>
            <hash numOwners="2" rehashRpcTimeout="120000" />
         <sync />
      </clustering>

      <loaders>
         <loader class="org.infinispan.loaders.file.FileCacheStore">
            <properties>
               <property name="location"
                  value="${mpsearch.infinispan.search.passivation.dir}" />
               <property name="streamBufferSize" value="15728640" />
            </properties>
            <!-- write-behind configuration starts here -->
            <!-- <async enabled="true" threadPoolSize="1" /> -->
            <!-- write-behind configuration ends here -->
         </loader>
      </loaders>
   </namedCache>

   <!-- ***************************** -->
   <!-- Cache to store Lucene locks -->
   <!-- ***************************** -->
   <namedCache name="LuceneIndexesLocking">
      <clustering mode="distribution">
          <l1 enabled="true" lifespan="60000"/>
            <hash numOwners="2" rehashRpcTimeout="120000" />
         <sync />
      </clustering>
   </namedCache>
</infinispan>

We had to stop using JTA in JBoss EAP 5.0 because it wasn't playing well with our Spring-based Hibernate/Hibernate Search app. We want to move to JBoss AS 7.0 (eventually EAP 6) so we can get back to using JTA. That's really the only thing in this config that I see as a glaring issue; the transactionManagerLookupClass may need to be the HibernateTransactionManagerLookup which will find the Spring local transaction.

Any obvious mistakes that you can see?

sanne.grinovero · **Posted:** Thu Nov 10, 2011 12:21 pm

Hello again astralbodies,
keep in mind that distribution might be more memory efficient, but when you're strongly read-most like in the Lucene case it's actually better to use replication: if you can keep it all in the memory of each node, then searches (and some write operations too) will be more efficient.
Also using L1 introduces additional locking and invalidation messages.

For these reasons we had the Lucene indexes cache support three different caches, and use each of them for different purposes: the LuceneIndexesMetadata and LuceneIndexesLocking are being used by very tiny values, so it's better to disable L1 and use REPL instead of DIST.

The LuceneIndexesData could use either REPL or DIST, depending on your memory/network.. try both, but especially try DIST without L1 as it acquires locks on read operations which are very bad for Lucene's data access patterns; If you have enough free memory that you're looking into L1, you might want to try a DIST with a higher number of numOwners .. the more, the higher the likelyhood for a specific node to have the needed values already in local memory.

Quote:

This might slow it down a bit; if you don't strictly need it, disable it. Or let's start tuning without it, and reintroduce it later.

Quote:

This is wrong! I guess I should write this down in red on the guide. the <indexing> tag is needed to index the values you're storing in the grid, but not needed (and harmfull for performance!) to store an index.

Quote:

We had to stop using JTA in JBoss EAP 5.0 because it wasn't playing well with our Spring-based Hibernate/Hibernate Search app. We want to move to JBoss AS 7.0 (eventually EAP 6) so we can get back to using JTA. That's really the only thing in this config that I see as a glaring issue; the transactionManagerLookupClass may need to be the HibernateTransactionManagerLookup which will find the Spring local transaction.

Make sure Hibernate Search and Spring are using the same transaction manger (whatever that's JTA, the local transaction, or any other TM) or you'll face weird issues.

You're already using exclusive indexing and tuned the Lucene parameters to not flush too often? Which version of Hibernate Search?

astralbodies · **Posted:** Tue Nov 15, 2011 5:55 pm

Didn't see the response - it appears the forum neglected to e-mail me!

I have turned off the JMX statistics gathering and also that indexing config. I told the consultant we had in here I believed that setting was specifically for Infinispan Query. I will try and give that a whirl.

We are using:
Hibernate Search 3.4.1
Hibernate 3.6.6
Lucene Core 3.1.0
Infinispan 4.2.1
JGroups 2.12.1.3

astralbodies · **Posted:** Tue Nov 15, 2011 6:04 pm

Sanne - I think I should post our latest config as our consultant had changed a few things since I posted originally. I am also putting our JGroups config in; this was pretty much never tuned as it came right out of Infinispan's test package. This config includes the changes you suggested.

Thanks!
Aaron

Code:

<?xml version="1.0" encoding="UTF-8"?>
<infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="urn:infinispan:config:4.2 http://www.infinispan.org/schemas/infinispan-config-4.2.xsd"
   xmlns="urn:infinispan:config:4.2">

   <!-- *************************** -->
   <!-- System-wide global settings -->
   <!-- *************************** -->

   <global>

        <evictionScheduledExecutor factory="org.infinispan.executors.DefaultScheduledExecutorFactory">
            <properties>
                <property name="threadNamePrefix" value="EvictionThread"/>
            </properties>
        </evictionScheduledExecutor>
        <replicationQueueScheduledExecutor factory="org.infinispan.executors.DefaultScheduledExecutorFactory">
            <properties>
                <property name="threadNamePrefix" value="ReplicationQueueThread"/>
            </properties>
        </replicationQueueScheduledExecutor>
        <asyncTransportExecutor factory="org.infinispan.executors.DefaultExecutorFactory">
            <properties>
                <property name="threadNamePrefix" value="AsyncThread"/>
                <property name="maxThreads" value="1"/>
            </properties>
        </asyncTransportExecutor>
      <!-- Duplicate domains are allowed so that multiple deployments with default 
         configuration of Hibernate Search applications work - if possible it would 
         be better to use JNDI to share the CacheManager across applications -->
      <globalJmxStatistics enabled="false"
         cacheManagerName="HibernateSearch" allowDuplicateDomains="true"
         mBeanServerLookup="org.infinispan.jmx.JBossMBeanServerLookup" />

      <!-- If the transport is omitted, there is no way to create distributed 
         or clustered caches. There is no added cost to defining a transport but not 
         creating a cache that uses one, since the transport is created and initialized 
         lazily. -->
      <transport
         clusterName="${mpsearch.infinispan.cluster.prefix}-HibernateSearch-Infinispan-Cluster"
         distributedSyncTimeout="65000">
         <!-- Note that the JGroups transport uses sensible defaults if no configuration 
            property is defined. See the JGroupsTransport javadocs for more flags -->
         <properties>
            <!-- TODO: Change to udp.xml once streaming transfer requirement has 
               been removed. -->
            <property name="configurationFile" value="jgroups-udp.xml" />
         </properties>
      </transport>

      <!-- Used to register JVM shutdown hooks. hookBehavior: DEFAULT, REGISTER, 
         DONT_REGISTER. Hibernate Search takes care to stop the CacheManager so registering 
         is not needed -->
      <shutdown hookBehavior="DEFAULT" />

   </global>

   <!-- *************************** -->
   <!-- Default "template" settings -->
   <!-- *************************** -->

   <default>

        <deadlockDetection enabled="true" />
        
        <!--  <transaction transactionManagerLookupClass="org.infinispan.transaction.lookup.GenericTransactionManagerLookup" />-->
        
      <locking concurrencyLevel="800" useLockStriping="true"/>

      <!--  <lazyDeserialization enabled="false" />-->

      <!-- Invocation batching is required for use with the Lucene Directory -->
      <invocationBatching enabled="true" />
      <indexing enabled="false" indexLocalOnly="true"/>

      <!-- This element specifies that the cache is clustered. modes supported: 
         distribution (d), replication (r) or invalidation (i). Don't use invalidation 
         to store Lucene indexes (as with Hibernate Search DirectoryProvider). Replication 
         is recommended for best performance of Lucene indexes, but make sure you 
         have enough memory to store the index in your heap. Also distribution scales 
         much better than replication on high number of nodes in the cluster. -->
      <clustering mode="distribution">
          <l1 enabled="true" lifespan="100000"/>
            <hash numOwners="2" rehashRpcTimeout="120000"/>
         <!-- Network calls are synchronous by default -->
         <!--  <sync replTimeout="30000" />-->
         <async />
      </clustering>

      <jmxStatistics enabled="false" />

      <eviction maxEntries="-1" strategy="NONE" />

      <expiration maxIdle="-1" />

   </default>

   <!-- ******************************************************************************* -->
   <!-- Individually configured "named" caches. -->
   <!-- -->
   <!-- While default configuration happens to be fine with similar settings 
      across the -->
   <!-- three caches, they should generally be different in a production environment. -->
   <!-- -->
   <!-- Current settings could easily lead to OutOfMemory exception as a CacheStore -->
   <!-- should be enabled, and maybe distribution is desired. -->
   <!-- ******************************************************************************* -->

   <!-- *************************************** -->
   <!-- Cache to store Lucene's file metadata -->
   <!-- *************************************** -->
   <namedCache name="LuceneIndexesMetadata">
      <clustering mode="replication">
         <stateRetrieval fetchInMemoryState="true"
            logFlushTimeout="30000" />
         <sync replTimeout="25000" />
      </clustering>

      <loaders passivation="false" shared="false" preload="false">
         <loader class="org.infinispan.loaders.file.FileCacheStore"
            fetchPersistentState="true" purgeOnStartup="false">

            <properties>
               <property name="location" value="${mpsearch.infinispan.search.passivation.dir}" />
            </properties>
         </loader>
      </loaders>
   </namedCache>

   <!-- **************************** -->
   <!-- Cache to store Lucene data -->
   <!-- **************************** -->
   <namedCache name="LuceneIndexesData">
      <eviction wakeUpInterval="5000" maxEntries="500" strategy="LIRS" />

      <clustering mode="distribution">
            <hash numOwners="2" rehashRpcTimeout="120000" />
         <!--  <sync replTimeout="30000" />-->
         <async /> 
      </clustering>

      <loaders>
         <loader class="org.infinispan.loaders.file.FileCacheStore">
            <properties>
               <property name="location"
                  value="${mpsearch.infinispan.search.passivation.dir}" />
               <property name="streamBufferSize" value="15728640" />
            </properties>
            <!-- write-behind configuration starts here -->
            <async enabled="true" threadPoolSize="50" /> 
            <!-- write-behind configuration ends here -->
            
         </loader>
      </loaders>
   </namedCache>

   <!-- ***************************** -->
   <!-- Cache to store Lucene locks -->
   <!-- ***************************** -->
   <namedCache name="LuceneIndexesLocking">
      <clustering mode="replication">
         <stateRetrieval fetchInMemoryState="true"
            logFlushTimeout="30000" />
         <sync replTimeout="25000" />
      </clustering>
   </namedCache>

</infinispan>

JGroups -

Code:

<?xml version="1.0" encoding="UTF-8"?>
<config xmlns="urn:org:jgroups"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="urn:org:jgroups file:schema/JGroups-2.8.xsd">
   <UDP
         mcast_addr="${mpsearch.infinispan.search.jgroups.udp.mcast_addr:228.6.7.8}"
         mcast_port="${mpsearch.infinispan.search.jgroups.udp.mcast_port:46655}"
         tos="8"
         ucast_recv_buf_size="20000000"
         ucast_send_buf_size="640000"
         mcast_recv_buf_size="25000000"
         mcast_send_buf_size="640000"
         loopback="true"
         discard_incompatible_packets="true"
         max_bundle_size="64000"
         max_bundle_timeout="30"
         ip_ttl="${jgroups.udp.ip_ttl:2}"
         enable_bundling="true"
         enable_diagnostics="false"

         thread_naming_pattern="pl"

         thread_pool.enabled="true"
         thread_pool.min_threads="2"
         thread_pool.max_threads="30"
         thread_pool.keep_alive_time="5000"
         thread_pool.queue_enabled="false"
         thread_pool.queue_max_size="100"
         thread_pool.rejection_policy="Discard"

         oob_thread_pool.enabled="true"
         oob_thread_pool.min_threads="2"
         oob_thread_pool.max_threads="30"
         oob_thread_pool.keep_alive_time="5000"
         oob_thread_pool.queue_enabled="false"
         oob_thread_pool.queue_max_size="100"
         oob_thread_pool.rejection_policy="Discard"
         />

   <PING timeout="3000" num_initial_members="3"/>
   <MERGE2 max_interval="30000" min_interval="10000"/>
   <FD_SOCK/>
   <FD_ALL/>
   <BARRIER />
   <pbcast.NAKACK use_stats_for_retransmission="false"
                   exponential_backoff="0"
                   use_mcast_xmit="true" gc_lag="0"
                   retransmit_timeout="300,600,1200"
                   discard_delivered_msgs="true"/>
   <UNICAST timeout="300,600,1200"/>
   <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000" max_bytes="1000000"/>
   <pbcast.GMS print_local_addr="false" join_timeout="3000" view_bundling="true"/>
   <UFC max_credits="500000" min_threshold="0.20"/>
   <MFC max_credits="500000" min_threshold="0.20"/>
   <FRAG2 frag_size="60000"  />
   <pbcast.STREAMING_STATE_TRANSFER/>
   <pbcast.FLUSH timeout="0"/>
</config>

sanne.grinovero · **Posted:** Wed Nov 16, 2011 5:39 am

Quote:

You should only use that to spot deadlocks; when you see it's fine disable it or it will slow down all operations.

Quote:

Don't use lockStriping!

Quote:

L1 can have strong effects on performance; depending on your load and network I can't predict if it would be better to enable it or to avoid it; make sure you try both cases.

I'm again travelling, sorry I won't be able to run tests this week to try giving a better look, but please keep me posted.

The JGroups configuration seems fine, but I'm not an expert on that. Make sure the buffer sizes match your network capabilities and settings;

Where you able to profile it and identify a specific issue?

astralbodies · **Posted:** Tue Nov 29, 2011 11:21 am

Hi Sanne -

We have decided to scrap Infinispan 4.2 with Hibernate Search 4 being our ultimate goal. For the time being, we're using a FSMasterDirectoryProvider and FSSlaveDirectoryProvider with very good performance results. It's not ideal having an NFS share between our nodes since it's not failover tolerant and limits us to index on a single node.

We're going to have to wait on Hibernate 4 / Hibernate Search 4 for the Spring Framework 3.1 to be final. H4 support is being rewritten in Spring to separate itself from original Hibernate 3 logic. I've found it buggy and this prevents the upgrade. We're also upgrading to JBoss 7 in hopes of EAP 6 coming out soon.

Thanks for all of your help so far with tuning!

Aaron Douglas

sanne.grinovero · **Posted:** Tue Nov 29, 2011 2:43 pm

Hi,
thanks for the update; just some thoughts:

Quote:

We have decided to scrap Infinispan 4.2 with Hibernate Search 4 being our ultimate goal. For the time being, we're using a FSMasterDirectoryProvider and FSSlaveDirectoryProvider with very good performance results. It's not ideal having an NFS share between our nodes since it's not failover tolerant and limits us to index on a single node.

Why is it not failover tolerant? Depends on your JMS queues configuration. Anyway even with Infinispan you need a single writer.

JBoss 7 is highly recommended; not sure what the Spring people is doing as it shouldn't need a rewrite at all, but anyway I'm not sure about which area they integrate with.

astralbodies · **Posted:** Tue Nov 29, 2011 3:09 pm

It's not failover tolerant from the standpoint if the master node goes down (OS and all) then the shares on the slave nodes won't be valid. We'd have to remount the source folders to another place and flip on the master switch on one of the slave nodes. It's not automatic but it could be done. Unfortunately we don't manage our data center so it's a non-trivial task to do this.

Our architecture allowed us to "get away" with having all master nodes since we guaranteed a single node would only be writing to a particular index. As we move forward, we're combining several indexes back into one index which would decrease the amount of indexing nodes. In our setup Hibernate isn't doing much in the way of persisting data; our data transformation tool is pulling the outside data into our database schema directly and then sends small reindexing messages to the appropriate queue/node. That message forces an eviction of the updated entity from Hibernate and then Search reindexes it.

Spring is responsible for creating the SessionFactory instance and it also delegates transaction management to itself as well. We hook Spring into JTA and then Hibernate called Spring for the current transaction. The problem lies with the SessionFactory factory bean Spring provides not flipping the right switches and such when bringing up Hibernate. They're working on it but it's still not a GA release for Hibernate4 support.