sanne.grinovero wrote:
Hello,
this is unexpected and I don't have a specific idea, but can think of some possibilities. One complexity is that version 3.4 is extremely old, and since Hibernate Search started at 3.0x that was a very early version so I wouldn't rule out that you were affected by some issue which got resolved - for example an explanation would be that it wouldn't index non-initialized collections.
I have some memories of introducing explicit initialization of relations at some point: the code was working fine in almost all cases by simply relying on lazy initialization, although the explicit initialization was introduced to workaround a rare bug in ORM regarding its interaction with relations which are also proxies and stored in 2nd level cache.
So now the code does explicit initialization of these lazy loaded relations; if this is the change affecting you that implies that you were not actually having some necessary data being indexed?
In this case it might be worth reviewing your annotations to try restrict the loading graph to the fieldd and elements which you are needing. Compared to version 3.4 there are many new options to narrow down the indexed fields more - such as using 
Code:
@IndexedEmbedded(includePaths=..)
If that's not helping, we might need to explore what is being loaded exactly differently.
I assume that this implies you're also upgrading Hibernate ORM with a significant leap, so the cause might lie in some ORM mapping or loading strategy changes too.
A way to reproduce this issue would be great! Glad to help if I can.
So as it turns out, the reason for the performance hit would seem to be that all the objects that contained fields that we wanted to be included in the index were also annotated with @Indexed. Meaning, we have a ClassA which has a many to one relationship to ClassB, which in turn has a one to many relationship with ClassC, which in turn has a many to one relationship with ClassD. (The real model is much more complex, I have simplified a bit.) There are fields on both ClassB, ClassC and ClassD that need to be included in the index of ClassA, and we also need ClassA's index to be updated when there is an update to one of the classes other than ClassA. In order to ensure that that happens, we had decorated ALL the classes with an @Indexed annotation, which ensured that even when ClassB, ClassC or ClassD was updated independently of ClassA, the changes would be reflected in ClassA's index too.
The issue is that it seems to me that the new functionality that was added to the @ContainedIn annotation is now causing the performance hit. In the Hibernate documentation for version 5.5.2 section 4.1.4 it says:
Quote:
While @ContainedIn is often seen as the counterpart of @IndexedEmbedded, it can also be used on its own to build an indexing dependency graph.
When an entity is reindexed, all the entities pointed by @ContainedIn are also going to be reindexed.
As a result of this new functionality, when the reindexing occurs, even if the only object altered or added was ClassA, the nature of the references, as well as the fact that all the classes are marked as indexed, causes Hibernate search to fetch ALL the collections of objects connected. We are an enterprise application, and the data sometimes is rather large, hence the performance overhead.
I am curious if anyone has a suggestion how to cause the updates to objects downstream to be reflected in the upstream object index without needing to index all the objects along the way?