Performance issues upon transaction completion

eli.julian · **Joined:** Tue Mar 22, 2016 11:30 am **Posts:** 3

We recently updated from Hibernate Search 3.4 to 5.5.2.
In the new version, we are experiencing a serious performance hit when adding a new indexed entity which is connected by a foreign key to many other entities, something that did not happen in HS 3.4. When profiling with VisualVM, I see that the bulk of the performance hit is in the method org.hibernate.search.backend.impl.WorkQueue.prepareWorkPlan(), which is called during the commit of the transaction adding the new entity. Ultimately, deep in the call stack, I can see that in the process of enqueueing the new work for lucene for the new entity, there are many calls to initialize lazy collections on the new entities, which is obviously performance heavy due to the many objects connected to it via FK. Again, this is a new behavior in version, 5.5.2 and was not present in 3.4.

Any suggestions? Is this by any chance configurable and I am just missing the configuration? Is there any information that I can add that would help you understand the context?

sanne.grinovero · **Posted:** Thu Mar 24, 2016 10:19 am

Hello,
this is unexpected and I don't have a specific idea, but can think of some possibilities. One complexity is that version 3.4 is extremely old, and since Hibernate Search started at 3.0x that was a very early version so I wouldn't rule out that you were affected by some issue which got resolved - for example an explanation would be that it wouldn't index non-initialized collections.

I have some memories of introducing explicit initialization of relations at some point: the code was working fine in almost all cases by simply relying on lazy initialization, although the explicit initialization was introduced to workaround a rare bug in ORM regarding its interaction with relations which are also proxies and stored in 2nd level cache.

So now the code does explicit initialization of these lazy loaded relations; if this is the change affecting you that implies that you were not actually having some necessary data being indexed?
In this case it might be worth reviewing your annotations to try restrict the loading graph to the fieldd and elements which you are needing. Compared to version 3.4 there are many new options to narrow down the indexed fields more - such as using

Code:

@IndexedEmbedded(includePaths=..)

If that's not helping, we might need to explore what is being loaded exactly differently.
I assume that this implies you're also upgrading Hibernate ORM with a significant leap, so the cause might lie in some ORM mapping or loading strategy changes too.
A way to reproduce this issue would be great! Glad to help if I can.

eli.julian · **Joined:** Tue Mar 22, 2016 11:30 am **Posts:** 3

sanne.grinovero wrote:

Hello,
this is unexpected and I don't have a specific idea, but can think of some possibilities. One complexity is that version 3.4 is extremely old, and since Hibernate Search started at 3.0x that was a very early version so I wouldn't rule out that you were affected by some issue which got resolved - for example an explanation would be that it wouldn't index non-initialized collections.

I have some memories of introducing explicit initialization of relations at some point: the code was working fine in almost all cases by simply relying on lazy initialization, although the explicit initialization was introduced to workaround a rare bug in ORM regarding its interaction with relations which are also proxies and stored in 2nd level cache.

So now the code does explicit initialization of these lazy loaded relations; if this is the change affecting you that implies that you were not actually having some necessary data being indexed?
In this case it might be worth reviewing your annotations to try restrict the loading graph to the fieldd and elements which you are needing. Compared to version 3.4 there are many new options to narrow down the indexed fields more - such as using

Code:

@IndexedEmbedded(includePaths=..)

If that's not helping, we might need to explore what is being loaded exactly differently.
I assume that this implies you're also upgrading Hibernate ORM with a significant leap, so the cause might lie in some ORM mapping or loading strategy changes too.
A way to reproduce this issue would be great! Glad to help if I can.

So as it turns out, the reason for the performance hit would seem to be that all the objects that contained fields that we wanted to be included in the index were also annotated with @Indexed. Meaning, we have a ClassA which has a many to one relationship to ClassB, which in turn has a one to many relationship with ClassC, which in turn has a many to one relationship with ClassD. (The real model is much more complex, I have simplified a bit.) There are fields on both ClassB, ClassC and ClassD that need to be included in the index of ClassA, and we also need ClassA's index to be updated when there is an update to one of the classes other than ClassA. In order to ensure that that happens, we had decorated ALL the classes with an @Indexed annotation, which ensured that even when ClassB, ClassC or ClassD was updated independently of ClassA, the changes would be reflected in ClassA's index too.

The issue is that it seems to me that the new functionality that was added to the @ContainedIn annotation is now causing the performance hit. In the Hibernate documentation for version 5.5.2 section 4.1.4 it says:

Quote:

While @ContainedIn is often seen as the counterpart of @IndexedEmbedded, it can also be used on its own to build an indexing dependency graph.

When an entity is reindexed, all the entities pointed by @ContainedIn are also going to be reindexed.

As a result of this new functionality, when the reindexing occurs, even if the only object altered or added was ClassA, the nature of the references, as well as the fact that all the classes are marked as indexed, causes Hibernate search to fetch ALL the collections of objects connected. We are an enterprise application, and the data sometimes is rather large, hence the performance overhead.

I am curious if anyone has a suggestion how to cause the updates to objects downstream to be reflected in the upstream object index without needing to index all the objects along the way?

sanne.grinovero · **Posted:** Sat Apr 23, 2016 3:13 pm

Hi,
the documentation change around @ContainedIn was meant as a clarification but we didn't change the semantics.

There have been bugfixes which possibly cause Hibernate Search 5.0 and beyond to index more entities than previously - this is expected. If that is the case, you might want to use the "includePaths" attribute of @IndexedEmbedded ?

This attribute should allow you to control exactly which "paths" to the indexed relations need to be followed, and you can control it for each type independently.

See also : http://docs.jboss.org/hibernate/search/ ... ncludePath

HTH!

eli.julian · **Joined:** Tue Mar 22, 2016 11:30 am **Posts:** 3

sanne.grinovero wrote:

Hi,
the documentation change around @ContainedIn was meant as a clarification but we didn't change the semantics.

There have been bugfixes which possibly cause Hibernate Search 5.0 and beyond to index more entities than previously - this is expected. If that is the case, you might want to use the "includePaths" attribute of @IndexedEmbedded ?

This attribute should allow you to control exactly which "paths" to the indexed relations need to be followed, and you can control it for each type independently.

See also : http://docs.jboss.org/hibernate/search/ ... ncludePath

HTH!

Hi Sanne, thank you for writing back.

The "includePaths" attribute unfortunately does not help with the issue that I mentioned about updates to non-indexed objects. Even with the "includePaths" attribute, when an object that itself is not directly indexed is updated, the change is still not reflected in the index of it's related objects. I may be mistaken but it is my impression that this is not supposed to be the case, and updates to ANY object that is included in the index through the @IndexedEmbedded annotation is supposed to be caught by Hibernate Search and the index should be updated accordingly. The performance issue was solved by removing the extra @Indexed annotations, however with that we lost the updates to the root index.

Do you have suggestions how we can ensure that the index is updated when the other objects are updated?

sanne.grinovero · **Posted:** Mon Apr 25, 2016 12:00 pm

Hi, I'll have to admit I'm a bit confused on what you're expecting. I'm aware that indexing a complex graph can be a performance problem, but I'm not understanding why the latest version should perform worse: it shouldn't.

Do you think you could formulate this case as a runnable project for me to inspect?

Ideally you could make a unit test, but I'm also ok to inspect a standalone project if you can make it very simple (please remove any code which is not essential to show this problem).

Here you can see an example of one of our tests from this area:
- https://github.com/hibernate/hibernate- ... sTest.java

Please, feel free to take that as a template and make a test to show your issue.

busitech · **Posted:** Mon Apr 17, 2017 7:58 pm

We are having the same problem, during a process that should not contain any index changes, or at least it is not easily discernible why reindexing is taking place.

One thing I am expecting is that an entity which is updated would not be reindexed if the changes to that entity are not a @Field. Reindexing should take place only if members of the index have been altered.

Another thing I am expecting is that the indexing logic has a fast way to access the entirety of the index areas affected by an update to a @Field.

It could be that we just need some tools which can reveal the reason for reindexing, so that performance problems in this area can be addressed easily. Is there a logging category which might be enabled to reveal detail of this nature?

busitech · **Posted:** Wed Apr 19, 2017 6:05 pm

I have eliminated our performance issue, and I'd like to share our findings. Our problem was caused by adding a @ContainedIn annotation to a property within an entity that also had at least one other property annotated with both @Transient and @Field.

HSEARCH-1096 tried to address the issue of @Transient fields never being marked dirty, but the change caused another issue, namely, that @Transient fields are always marked dirty... This causes terrible performance, especially if the entity being updated is the target of a @ManyToOne relationship. One record being reindexed without good cause can trigger the subsequent reindexing of an entire table full of records, if the original entity is a popular one.

HSEARCH-1093 is the acknowledgement that dirty checking with @Transient fields needs to be a lot smarter. I would posit that in the mean time, at least making this determination user defined, before it gains the logic to become genuinely smarter, would be far superior to the current situation by several orders of magnitude.

As a workaround, we have removed all of our @Transient @Field properties in favor of alternative implementations.