Clustering Hibernate Search

frankr · **Joined:** Thu Aug 13, 2009 9:56 pm **Posts:** 1

I'm working on clustering of HS on shared storage (GFS) as follows:

1. Multiple read-only search readers which use the index for search queries.
2. A single indexer which asynchronously updates the index files.

To improve distributed locking delays, the reader cluster nodes could be mounted read-only and only the indexer read-write but the documentation does not give me a conclusive answer on this one:

Quote:

http://wiki.apache.org/jakarta-lucene/LuceneFAQ

Even though index searching is a read only operation, the IndexSearcher must momentarily lock the index when it is opened in order to get the list of files in the index. If locking is not configured properly it gets an incorrect list (because the list of files changes as the IndexWriter adds docs or optimizes the index). Remote filesystems (like NFS and Samba) rarely work, because they cannot make the transactional guarantees neccessary to ensure that all clients get consistent views of the directory.

Quote:

http://wiki.apache.org/jakarta-lucene/ImproveSearchingSpeed

Remote filesystems are typically quite a bit slower for searching. If the index must be remote, try to mount the remote filesystem as a "readonly" mount. In some cases this could improve performance.

Can somebody confirm that it is possible to have only one async indexer mounting read-write and multiple index readers mounting read-only?

sanne.grinovero · **Posted:** Fri Aug 14, 2009 1:19 pm

yes that's the general idea, be careful that you should make sure IndexReaders will be able to read segments as they opened them, so that the IndexWriter node should not delete files still in use by any reader.
I don't know how GFS would handle that, in a local FS it usually works as Lucene uses the "Delete on close" mechanisms, but this is known to be broken on NFS.
I'd suggest to use the JMS backend coming with Hibernate Search; on the dev list you'll see much activity as we are developing an InfiniSpan backed Directory to store the index: that is IMHO the best solution for clustering, but it's not ready. Hibernate Search's JMS backend should work fine for know.