HS: How to have multiple indexes on the same table.

johnbyng · **Joined:** Fri Feb 13, 2009 4:54 am **Posts:** 10

Hi,

We are using the latest version of Hibernate Search. Our MySQL table has over 1 billion records and we are finding that Lucene is going to take about a month to index it all if it does it as one huge index - even using all the optimisations given in "HS in Action".

We don't need that anyway. The table has a field in it called marcTable which can contain values from 1 to 999 (only about half of the values are actually valid anyway). We would like to have 999 indexes for this one table with a discriminator using marcTable. Our use-case only requires searching to be carried out within each marcTable type, not across all of them.

I have already investigated sharding on the values of marcTable, but this is really not a solution because Lucene will still index across all rows for the table (as far as I understand sharding; it's splitting the index into different files, not different, siloed, indexes).

We could have 999 tables (marcTable1, marcTable2, etc) which would give us 999 separate indexes, but that is a horrible solution - not to mention having to define 999 classes and the enormous startup time for hibernate with so many classes.

Is there any way to set this up with Hibernate Search?

Thanks.

sanne.grinovero · **Posted:** Sat Feb 14, 2009 8:09 am

Quote:

as far as I understand sharding; it's splitting the index into different files, not different, siloed, indexes

No, the purpose is really to get you to different indexes, "siloed" in different directories. Then each index may be managed indipendently and Lucene might have to make each one indipendently in different files, as usual for indexes.

So it should really be the feature you're looking for; we could have to fix some documentation? Could you point us to some reference which gave you the idea of "different files, coupled index" ??

The indexing speed is another issue: the index structure could be the performance bottleneck, but I doubt it. First you should check the way you entities get loaded: try enabling hibernate's query log and verify the way your data is being loaded; most of the time the problem source is the way collections are loaded with subsequent lazy queries, one new query for each root entity, for each collection, makes up a lot of DB queries and delays.

I am working on "indexing accelaration" but I am having big delays in releasing it in making it "general purpose" and not only tailored to my testcases. If you could provide me with the relevant model classes I am willing to take a look at it.