Some final thoughts for those interested in the same subject, after doing my own implementation.
I assume we have cloud application with potentially hundreds of tenants and we want to have separate index directory per tenant.
Second assumption is that we want to switch a tenat (and FSDirectory) in own DirectoryProvider implementation, each time we want to use concrete index. The simpliest solution for this is to hold the current tenant id in ThreadLocal, and I'm commenting the scenario with this assumption.
First, it should be safe to do it only with these (default) settings:
Code:
hibernate.search.worker.execution=sync
hibernate.search.default.exclusive_index_use=false
In this scenario the index writer works in synchronous mode and is closed each time the index is used, so it's safe to do the switch on DirectoryProvider level, however:
- each execution is done in separate thread, and there's a single thread per DirectoryProvider, so you need to write DirectoryProvider in the way that it knows the tenant id in non-request thread
- when the index decides to merge, the merge is done in separate thread, so you need to write DirectoryProvider in the way that it knows the tenant id in this thread too
- these two above might be tricky, but you can switch to SerialMergeScheduler in the IndexWriter, so you have only the 1st problem to solve
I didn't use this scenario, though. It causes several problems in "a lot of tenants" architecture:
- the indexwriter is created and closed each time, what can consume time
- the write operations are done sequentially per directory provider (in single-threaded thread pool); that means that if your 100 tenants writing to the same index (even if each one has different physical index dir), it's done one-by-one and the 101 tenant needs to wait for the others before executes its job
- in "sync" mode, even though we have multithreaded architecture, all work is anyway done sequentially (with waiting), so in the scenario above the 101 tenant will wait in the request thread for all other work to be done
Better performance would be achived with different settings, I use:
Code:
hibernate.search.worker.execution=async
hibernate.search.default.exclusive_index_use=true
because:
- all work is done asynchronously (without locking the current request thread)
- the IndexWriter writing to the same physical directory is never closed, and is reused
However the implementation for above conditions was a bit hard:
- I have the fixed thread pool with the only threads will be used for all operation (eg. 20-50) threads
- I have the facility that assigns the same tenant always to same thread executor (to assert the consistency of writing operations related to the same tenant), but new tenants are assigned to the least used thread executor
- these two above mean that I controll the total amount of threads (fixed) and equally distribute the tenant writing processes among these threads, so I have always N concurrent writings to different index directories (not sequential)
- I need to create and cache Workspaces per tenant, with opened IndexWriters and thread executor assignment and reuse this TenantWriterContext each time I write to the same tenant index (with some LRU and timeout cache dropping policy)
For the IndexReader I use similar model, ie. LRU/timeout cache with opened IndexReader-s and custom index reader strategy, because the "not-shared" strategy (which creates separate IndexReader on each call) can be inefficient - it's told in the code that this is a time-consuming operation.