Quote:
Each lucene document consists of a diverse set of data, about 200+ fields/terms
fields or terms? quite different.
Quote:
The database that the index is mapped to is updated continuously by thousands of users
that's normal :)
Quote:
Most changes in the database occurs outside the knowledge of search (we have a work around for this already)
Can you detail this better? what do you mean with "outside the knowledge" ? and in all cases I'd like to know about work arounds. Please see also the changes in 3.4.0.CR2 which provide some mayor performance boosts, especially around dirty checking optimizations and collection updates triggering the minimal set of needed updates to be send to the index.
Quote:
Most changes in the database occurs outside the knowledge of search (we have a work around for this already)
You can get full real time results using the Infinispan backend, especially as your index is so small it will also provide a nice performance boost, it's usually better than the FS based solutions, especially for frequently updated indexes (near real time updates being applied to the index).
Quote:
2) I know it is preferred for each slave node to have a their own local copy of the index, but how problematic (if at all) is it to have all slave nodes reading from a single NFS location?
That's extremely problematic. I've found just a couple of people which asserted that they had a new improved NFS version which could run Lucene without problems, only to get them complaining a month later that all their indexes suddenly where corrupted by some bad luck, and blogging like crazy that you should never try it because of the insane optimizations being done in Lucene. you've been warned :)