Hibernate Search - Support for Multi-tenancy

saran_m15 · **Joined:** Thu Mar 15, 2012 12:13 pm **Posts:** 3

We are currently working on changes that enables our application to support multi tenancy. We are using hibernate-search 3.4.x, the application is deployed on Jboss, every tenant hosted will have different DB schema and we switch to different DB schema based on the user logged in and the client(tenant) he belongs to. The lucene index by default creates an instance of DirectoryProvider per Indexed class. As we need to switch between different schema we need the index to be stored in different directories for different clients (tenants). For example:
for tenant 1 and Indexed class c1 the DP should use the structure - t1\c1\
for tenant 2 and Indexed class c1 the DP should use the structure - t2\c1\

Currently we have created a Custom FSDirectory provider that handles multiple clients by caching the directories by client and returning the appropriate directory in getDirectory() based on the client being served.

Questions:
a) In the above approach the DP instance is one/Indexed Class servicing multiple clients cause any problems?
b) Is it possible to create different directory provider instance per class per client at runtime?
c) What are the other options available?

Any help on this is appreciated.

sanne.grinovero · **Posted:** Fri Mar 16, 2012 11:18 am

Hi,
before I try answering, could you describe how your application interacts with the tenants? Are you having multiple Hibernate SessionFactory(es) ?

Multitenancy was introduced by Hibernate since version v. 4.0 but you seem to use the older version, so I guess this is a bit tricky..

saran_m15 · **Joined:** Thu Mar 15, 2012 12:13 pm **Posts:** 3

Hi,

We are using only one session factory, after opening the session we run "Alter session set current_schema" statement to switch the schema based on the tenant group the user belongs to.

Let me know if you need additional info.

One more information, we use hibernate.search.indexing_strategy=manual.

sanne.grinovero · **Posted:** Mon Mar 19, 2012 12:24 pm

I see, thank you for clarifying.

a) Make sure you disable exclusive index, or the IndexWriter will reuse the previously opened index, caching the reference to the Directory.

b) Well I think yes but the complexity is going to reside in the code of your custom DP: when requesting a FSDirectory for a tenant which is not existing, you create one. Make sure you have some lock to protect you from multiple invocations.

c) Did you consider filters? You could have all your tenants share the same index, and add a token in the indexed Document which identifies the tenant, then create a filter with a tenant_id parameter and make sure you apply this filter on all queries.

saran_m15 · **Joined:** Thu Mar 15, 2012 12:13 pm **Posts:** 3

Thanks for your input and suggestions.

As the requirement mandates data seperation between tenants we are trying to keep the index directories seperate.

sanne.grinovero · **Posted:** Tue Mar 20, 2012 1:05 pm

I was suspecting so ;-)

Are your tenants dynamic? I mean is it acceptable to restart and reconfigure your system when adding a new tenant?

I'm asking as you could combine filters with sharding. Sharding implies a physical separation of indexes, but the number of shards can't change at runtime.

l0co · **Joined:** Tue Apr 30, 2013 1:33 pm **Posts:** 2

Hello. This thread is created more than a year ago, but I'm currently doing the same thing as you and I'm really interested if the tenant switch on DirectoryProvider level turned out safe or not. For me it is not...

But first - we might use different versions of Hibernate Search. My one is 3.4.2.Final.

Now, why I'm currently thinking it's not safe to switch tenant on DirectoryProvider level. Regarding "Currently we have created a Custom FSDirectory provider that handles multiple clients by caching the directories by client and returning the appropriate directory in getDirectory() based on the client being served. " - this was exactly my first approach as well.

If you look into classes in org.hibernate.search.backend.impl.lucene package, you'll see that there's a single Workspace created per DirectoryProvider class. I assume you use these classes and you can see that LuceneBackendQueueProcessorFactory caches all "default" DirectoryProviders (readed from config) for further usage. I assume that you have some DirectoryProviders that are aware of the tenantId in the current thread and do the tenant-directory switch dynamically, but the default initialization of "default" DirectoryProviders stays as is in LuceneBackendQueueProcessorFactory.

LuceneBackendQueueProcessorFactory keeps the configuration per each "default" DirectoryProvider, and for all of them the separate Workspace is created. This is the only (tenant-unaware) Workspace per given index that is used in further execution. This Workspace creates one, and only one IndexWriter. When you write to this index, this IndexWriter is used, and it can decide if he should or shouldn't do a index merge. Moreover the merge is by default done in another thread, because ConcurrentMergeScheduler is a default policy for index merging.

The questions here are:
* do your the merge thread know on which directory it should execute the merge process (does the merge thread knows the tenant id)?
* what are exact decision rules of merge or not in IndexWriter (once it works on t1 directory, once on t2 etc.)?
* is it safe to have one IndexWriter while it has multiple backend DirectoryProvider? can't it calculate some rules on first tenant directory and then apply them to another tenant?

I believe this would be not safe to have single IndexWriter per directory, while the DirectoryProvider backend can switch dynamically exact directory it works on. In my opinion to achieve a cloud architecture for Hibernate Search you have to review and rewrite all classes in org.hibernate.search.backend.impl.lucene package, with you version. They should:
* create and cache Workspace per DirectoryProvider per tenant with different opened IndexWriter
* the best choice for me is put the writer execution to separate thread, but always to use SerialMergeScheduler in the writer itself (it's not neccessary to create another thread to do the merge, if it's already in another thread)
* when you drop Workspace from cache (eg. when the customer doesn't respond for some time), you need to perform CloseIndexRunnable(workspace) to assert that all merges for this IndexWriter are finished

There's another problem with dynamic directory switch in DirectoryProvider. I reviewed how things work by default, and by default you have one pool Executor with single thread per you "default" DirectoryProvider. This means that if you have index "X" and in your backed you switch dynamically exact directory pointing to the tenant location, you always have the "X" writer thread performed sequentially (and by default with wait/locking policy). This means that if you have 1000 tenant and they all write to "X" index, this work will be done sequentially in single thread with waiting for all end in the request thread. By telling that you need to rewrite all org.hibernate.search.backend.impl.lucene package classes, I also mean that this concurrency model should be rewritten. Best choice IMO is to use thread pool (eg. 20-50 threads max), where the given tenant is always assigned to the same thread, to assert index consistency.

l0co · **Joined:** Tue Apr 30, 2013 1:33 pm **Posts:** 2

Some final thoughts for those interested in the same subject, after doing my own implementation.

I assume we have cloud application with potentially hundreds of tenants and we want to have separate index directory per tenant.

Second assumption is that we want to switch a tenat (and FSDirectory) in own DirectoryProvider implementation, each time we want to use concrete index. The simpliest solution for this is to hold the current tenant id in ThreadLocal, and I'm commenting the scenario with this assumption.

First, it should be safe to do it only with these (default) settings:

Code:

hibernate.search.worker.execution=sync
hibernate.search.default.exclusive_index_use=false

In this scenario the index writer works in synchronous mode and is closed each time the index is used, so it's safe to do the switch on DirectoryProvider level, however:

each execution is done in separate thread, and there's a single thread per DirectoryProvider, so you need to write DirectoryProvider in the way that it knows the tenant id in non-request thread
when the index decides to merge, the merge is done in separate thread, so you need to write DirectoryProvider in the way that it knows the tenant id in this thread too
these two above might be tricky, but you can switch to SerialMergeScheduler in the IndexWriter, so you have only the 1st problem to solve

I didn't use this scenario, though. It causes several problems in "a lot of tenants" architecture:

the indexwriter is created and closed each time, what can consume time
the write operations are done sequentially per directory provider (in single-threaded thread pool); that means that if your 100 tenants writing to the same index (even if each one has different physical index dir), it's done one-by-one and the 101 tenant needs to wait for the others before executes its job
in "sync" mode, even though we have multithreaded architecture, all work is anyway done sequentially (with waiting), so in the scenario above the 101 tenant will wait in the request thread for all other work to be done

Better performance would be achived with different settings, I use:

Code:

hibernate.search.worker.execution=async
hibernate.search.default.exclusive_index_use=true

because:

all work is done asynchronously (without locking the current request thread)
the IndexWriter writing to the same physical directory is never closed, and is reused

However the implementation for above conditions was a bit hard:

I have the fixed thread pool with the only threads will be used for all operation (eg. 20-50) threads
I have the facility that assigns the same tenant always to same thread executor (to assert the consistency of writing operations related to the same tenant), but new tenants are assigned to the least used thread executor
these two above mean that I controll the total amount of threads (fixed) and equally distribute the tenant writing processes among these threads, so I have always N concurrent writings to different index directories (not sequential)
I need to create and cache Workspaces per tenant, with opened IndexWriters and thread executor assignment and reuse this TenantWriterContext each time I write to the same tenant index (with some LRU and timeout cache dropping policy)

For the IndexReader I use similar model, ie. LRU/timeout cache with opened IndexReader-s and custom index reader strategy, because the "not-shared" strategy (which creates separate IndexReader on each call) can be inefficient - it's told in the code that this is a time-consuming operation.