Hi,
Quote:
Please forgive my ignorant question: Does Hibernate Search implicitly handle this case using a version field? If not, we might need to implement our own mass indexer that is aware of the version number on the records it's indexing.
That's not at all dumb question ;)
Hibernate Search doesn't use versioning but while a MassIndexer is running, any new changes triggered by a running transaction are enqueued in the same strictly ordered queue of work generated by the MassIndexer.
This implies you definitely can run "normal" transactions in parallel with the MassIndexer, but only as long they are the same indexer node. So if you use the JMS approach, you can disconnect the master from receiving web traffict to help it but still let it be the master node for the other nodes as well. (If you enable two separate "masters" then you get in trouble).
Are you sure you want to re-implement all full-text queries using Hibernate Criteria to fill the gap? An alternative would be to temporarily stop re-synching the index, or avoid refreshing the IndexReaders, so that the client nodes can use the outdated index until the replacement is ready. Of course this only works if you are ok to use an index which is potentially out of date by some hours, but at least you won't be hammering your RDBMS with complex queries while it's also serving the massindexer.
Quote:
My other concern now is the latency that indexing will introduce to our hibernate transactions (particularly with hibernate.search.default.elasticsearch.refresh_after_write set to true).
I would not suggest that. Elasticsearch is really not designed to use
refresh_after_write all the time, this would severly impact its performance. To be entirely fair we primarily implemented support for this operation mode to make it easier to run integration tests, but it's highly recommended to consider the search engine as a service which is possibly slightly out of date.
Of course feel free to experiment - you might have specific requirements and you might have no choice - but I agree this would be a point of concern which requires significant testing with complete data sets.