Hi,
regarding performance, Hibernate Search's MassIndexer is expected to be -generally- faster than the simple ScrollableResult option, but I guess you've tried 4.0.0.Final which is affected by excessive background commits, which are slowing it down conswiderably:
https://hibernate.onjira.com/browse/HSEARCH-1019.
It's normal that is slows down a bit, as the cost of segments merges is related to the index size.
Currently there's an "async" option, but all it does it to work in background.. it will still wipe out the index, so your application can be used in the mean time but will be missing most results initially, gradually showing more results while it's approaching the finish.
In Search 3.x it was expected that indexing where to happen "offline", so it was not allowed to use the engine while index was being rebuilt - mostly targeting the use cases of upgrading an application, initial deployments, recovery after maintenance. Only since 4.0 it's allowed to use the application while a background MassIndexer is working, but the general pattern is still
1) wipe out the index (optionally can be skipped)
2) Add again all entities from the database, loading and processing them with multiple threads
So what I'd like to do is to add an option for phase 2) to use "Update" operations instead of "Add" operations, so that when one skips the initial wipe you don't end up with duplicates.
The limitation of this approach is that it will fail to delete entities from the index which are no longer in the database. Do you think this would still be acceptable for your case?
Otherwise we must add a third phase:
3) verify existance of all entities mentioned in the index - delete those not found in the database.
I think I could easily add the "Update" variant, but would need some more time for the third phase. What do you think of it?