I'm working on clustering of HS on shared storage (GFS) as follows:
1. Multiple read-only search readers which use the index for search queries.
2. A single indexer which asynchronously updates the index files.
To improve distributed locking delays, the reader cluster nodes could be mounted read-only and only the indexer read-write but the documentation does not give me a conclusive answer on this one:
Quote:
http://wiki.apache.org/jakarta-lucene/LuceneFAQ
Even though index searching is a read only operation, the IndexSearcher must momentarily lock the index when it is opened in order to get the list of files in the index. If locking is not configured properly it gets an incorrect list (because the list of files changes as the IndexWriter adds docs or optimizes the index). Remote filesystems (like NFS and Samba) rarely work, because they cannot make the transactional guarantees neccessary to ensure that all clients get consistent views of the directory.
Quote:
http://wiki.apache.org/jakarta-lucene/ImproveSearchingSpeed
Remote filesystems are typically quite a bit slower for searching. If the index must be remote, try to mount the remote filesystem as a "readonly" mount. In some cases this could improve performance.
Can somebody confirm that it is possible to have only one async indexer mounting read-write and multiple index readers mounting read-only?