-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 5 posts ] 
Author Message
 Post subject: Reindex performance regression on Hibernate Search 4.0
PostPosted: Tue Dec 20, 2011 6:56 am 
Newbie

Joined: Tue Dec 20, 2011 6:54 am
Posts: 4
We encounter a performance regression in massive reindex with Hibernate Search 4.0. After migration towards this new version, massive reindex starts to last more than 20 minutes with high iowait on host (with hibernate 3.6.8, it took approximately 5 minutes, ~100k entities database).

After some investigation, we found that downgrading to lucene 3.3.0 brings back original performance. I implemented a trivial ProgressMonitor and noticed the following speed :

Hibernate 4.0 - Lucene 3.3.0 : 3000entities / 15sec.
Hibernate 4.0 - Lucene 3.4.0 or Lucene 3.5.0 : 150 entities / 15sec.

As performance regression seems to be linked with lucene dependency, we think that it is related with fsync bug fix introduced in 3.4.0 (it explains the higher iowait noticed). With a breakpoint to count fsync hit in FSDirectory, it confirms that fsync is called frequently (~25 times / s.).
If we choose ram index storage, there is also no performance regression.

Does anyone else encounter this performance regression ? Do you know this problem ? Is there any configuration tweak available to disable fsync during massive reindex or any plan to enhance massive reindex performance ?


Thanks


Top
 Profile  
 
 Post subject: Re: Reindex performance regression on Hibernate Search 4.0
PostPosted: Wed Dec 21, 2011 12:09 pm 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Hi,

thanks for this thorough report. I was not aware of any performance regression. Which Lucene issue are you referring to?

--Hardy


Top
 Profile  
 
 Post subject: Re: Reindex performance regression on Hibernate Search 4.0
PostPosted: Wed Dec 21, 2011 12:40 pm 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Hi,

after some investigation I assume you mean this issue - https://issues.apache.org/jira/browse/LUCENE-3418.
Really it is a bug fix and before the fix data could be lost. The unfortunate side effect is a loss in performance. One thing we can do on the mass indexer side is to reduce the amount of index commits to alleviate the performance drop. How much this is possible and will help has to be seen. I created https://hibernate.onjira.com/browse/HSEARCH-1019 to track the issue.

--Hardy


Top
 Profile  
 
 Post subject: Re: Reindex performance regression on Hibernate Search 4.0
PostPosted: Fri Dec 30, 2011 5:08 am 
Newbie

Joined: Wed Sep 21, 2011 2:20 pm
Posts: 16
lalmeras wrote:
Hibernate 4.0 - Lucene 3.3.0 : 3000entities / 15sec.
Hibernate 4.0 - Lucene 3.4.0 or Lucene 3.5.0 : 150 entities / 15sec.

is the combination of HS4 and Lucene 3.4 and up always that slow or it is only in your special use case?

I would rather risk some data loss (the occasions seems to occur rather seldom), which can be corrected by rebuilding the index, than having such a slow indexing performance.

kind regards


Top
 Profile  
 
 Post subject: Re: Reindex performance regression on Hibernate Search 4.0
PostPosted: Fri Dec 30, 2011 9:16 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Quote:
is the combination of HS4 and Lucene 3.4 and up always that slow or it is only in your special use case?

I would rather risk some data loss (the occasions seems to occur rather seldom), which can be corrected by rebuilding the index, than having such a slow indexing performance.

kind regards

These figures are relative to a bug in the MassIndexer specifically; the overhead of using indexing during "normal" event driven operations can be significantly different and depends on your data values and general schema and indexing options; it can be much faster, simple (most trivial schema and data) reaches figures up to 12 millions entities in 3 minutes on my dual core laptop during MassIndexer operation (before this bug); Most of the work performed by the MassIndexer is usually data loading from the database, a step which is skipped when you're dealing with event-triggered indexing as the data is usually in large part already available in the persistence context; it can be slower when the indexing operation needs to load additional entities from the database.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 5 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.