-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 9 posts ] 
Author Message
 Post subject: HS - Optimize job only commited on sessionFactory close
PostPosted: Mon Sep 22, 2014 8:37 am 
Newbie

Joined: Fri Apr 04, 2014 4:37 am
Posts: 9
Location: Orleans - France
Hello,

I came across the following problem.

The result of the optimize work seems to be commited to the index only when the SessionFactory is closed - see ExclusiveIndexWorkspaceImpl.afterTransactionApplied with stream work bellow.

Code:
public class ExclusiveIndexWorkspaceImpl extends AbstractWorkspaceImpl {

   public ExclusiveIndexWorkspaceImpl(DirectoryBasedIndexManager indexManager, WorkerBuildContext context, Properties cfg) {
      super( indexManager, context, cfg );
   }

   @Override
   public void afterTransactionApplied(boolean someFailureHappened, boolean streaming) {
      if ( someFailureHappened ) {
         writerHolder.forceLockRelease();
      }
      else {
         if ( ! streaming ) {
            writerHolder.commitIndexWriter();
         }
      }
   }
...


In my context, all my batch jobs (technical and business) are SpringBatch ones deployed in JBoss EAP.

So to be homogeneous with all my applications, I wrap tasks like mass indexing or index optimizing in SpringBatch tasklet deployed in JBoss EAP. But with my problem on commit, result of optimize is only "visible" when my web app is disabled/enabled.

Is there a way to get the behaviour I expect ?

Thanks


Top
 Profile  
 
 Post subject: Re: HS - Optimize job only commited on sessionFactory close
PostPosted: Tue Sep 23, 2014 5:42 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi,
interesting observation!
What version of Hibernate Search are you testing?

Keep in mind that the index files you can "see" on disk don't necessarily match what's going on with the index, especially deletions might get postponed at filesystem level.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: HS - Optimize job only commited on sessionFactory close
PostPosted: Wed Sep 24, 2014 3:38 am 
Newbie

Joined: Fri Apr 04, 2014 4:37 am
Posts: 9
Location: Orleans - France
Hi Sanne

HS version I use is 4.4.2.Final. I'm not sure that my problem is due to difference between what I see on filesystem and what is in the index. I build a small test case doing the following things :

- begin transaction
- populate empty sgbd and index with 100K entities
- commit transaction
=> here my index on filesystem is composed of 119 files and its size is 11Mo

- begin transaction
- optimize
- commit transaction
=> here my index on filesystem is composed of 127 files and its size is 22Mo - no call of commit on IndexWriter here

- close session factory
=> index on filesystem is composed of 10 files and its size is 11Mo (it's optimized) - IndexWriter was commited during close

What do you think about it ?

Regards
Yoann GENDRE


Top
 Profile  
 
 Post subject: Re: HS - Optimize job only commited on sessionFactory close
PostPosted: Thu Sep 25, 2014 1:40 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi Yoann,
I think I would expect that, as the IndexWriter and the IndexReader have separate file handles: when you call optimize(), the IW will generate a new optimized copy of the index; this is essentially a duplicate so your index size doubles, and while that is happening but also after it finished, the currently open IndexReaders have buffers open to the previous file segments until the IndexReader is either refreshed or closed; and IndexReader is only refreshed "on demand", so you might need to trigger that by running a Query. Of course if you were to shut down the SessionFactory, the IndexReaders get closed too so the old file handles are released.

We can think of improving this, but is this causing any problem?

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: HS - Optimize job only commited on sessionFactory close
PostPosted: Thu Sep 25, 2014 1:45 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
P.S. If this is causing a problem don't hesitate further and open a JIRA, we'll see what can be done.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: HS - Optimize job only commited on sessionFactory close
PostPosted: Fri Sep 26, 2014 5:38 am 
Newbie

Joined: Fri Apr 04, 2014 4:37 am
Posts: 9
Location: Orleans - France
Hi Sanne,

I understand behaviour of IR and IW on separate file handles. But I think that until IW has not commited optimize change, the IR can't see these changes even with a refresh.

To illustrate that, I change my test case to add a query as you said. Here is what my test do on an index of 100K documents sizing 10Mo :

- open sessionfactory
- query index in a new session
- optimize index in new session
- query index in a new session
- close sessionfactory
- open sessionfactory
- query index in a new session

and bellow an extract of my log. As you can see, IR don't work with optimized segments before sessionFactory is closed and IR commited.

OptimizeTest -> idx size : 12412131
OptimizeTest -> query index
OptimizeTest -> before query
SharingBufferReaderProvider -> Opening IndexReader for directoryProvider fr.dsirc.testcase.model.Individu
SharingBufferReaderProvider -> Closing IndexReader: ReadOnlyDirectoryReader(segments_5l _k(3.6.2):C9500 _l(3.6.2):c500 _m(3.6.2):c500 _n(3.6.2):c500 _o(3.6.2):c500 _p(3.6.2):c500 _q(3.6.2):c500 _r(3.6.2):c500 _s(3.6.2):c500 _t(3.6.2):c500 _1e(3.6.2):C9500 _1f(3.6.2):c500 _1g(3.6.2):c500 _1h(3.6.2):c500 _1i(3.6.2):c500 _1j(3.6.2):c500 _1k(3.6.2):c500 _1l(3.6.2):c500 _1m(3.6.2):c500 _1n(3.6.2):c500 _28(3.6.2):C9500 _29(3.6.2):c500....
OptimizeTest -> after query
OptimizeTest -> idx size : 12412131
OptimizeTest -> optimize
OptimizeTest -> idx size : 23680101
OptimizeTest -> query index
OptimizeTest -> before query
SharingBufferReaderProvider -> Opening IndexReader for directoryProvider fr.dsirc.testcase.model.Individu
SharingBufferReaderProvider -> Closing IndexReader: ReadOnlyDirectoryReader(segments_5l _k(3.6.2):C9500 _l(3.6.2):c500 _m(3.6.2):c500 _n(3.6.2):c500 _o(3.6.2):c500 _p(3.6.2):c500 _q(3.6.2):c500 _r(3.6.2):c500 _s(3.6.2):c500 _t(3.6.2):c500 _1e(3.6.2):C9500 _1f(3.6.2):c500 _1g(3.6.2):c500 _1h(3.6.2):c500 _1i(3.6.2):c500 _1j(3.6.2):c500 _1k(3.6.2):c500 _1l(3.6.2):c500 _1m(3.6.2):c500 _1n(3.6.2):c500 _28(3.6.2):C9500 _29(3.6.2):c500....
OptimizeTest -> after query
OptimizeTest -> idx size : 23680101
OptimizeTest -> commit optimize transaction and close session
OptimizeTest -> query index
OptimizeTest -> before query
SharingBufferReaderProvider -> Opening IndexReader for directoryProvider fr.dsirc.testcase.model.Individu
SharingBufferReaderProvider -> Closing IndexReader: ReadOnlyDirectoryReader(segments_5l _k(3.6.2):C9500 _l(3.6.2):c500 _m(3.6.2):c500 _n(3.6.2):c500 _o(3.6.2):c500 _p(3.6.2):c500 _q(3.6.2):c500 _r(3.6.2):c500 _s(3.6.2):c500 _t(3.6.2):c500 _1e(3.6.2):C9500 _1f(3.6.2):c500 _1g(3.6.2):c500 _1h(3.6.2):c500 _1i(3.6.2):c500 _1j(3.6.2):c500 _1k(3.6.2):c500 _1l(3.6.2):c500 _1m(3.6.2):c500 _1n(3.6.2):c500 _28(3.6.2):C9500 _29(3.6.2):c500....
OptimizeTest -> after query
OptimizeTest -> idx size : 23680101
OptimizeTest -> closing session factory
//session factory is closed. IW is commited on close. Idx size in FS is 11Mo and nex IR will work with the new segment.
OptimizeTest -> idx size : 11268267
OptimizeTest -> query index
OptimizeTest -> before query
SharingBufferReaderProvider -> Opening IndexReader for directoryProvider fr.dsirc.testcase.model.Individu
SharingBufferReaderProvider -> Closing IndexReader: ReadOnlyDirectoryReader(segments_5m _66(3.6.2):C100000)
OptimizeTest -> after query
OptimizeTest -> idx size : 11268267



Am I missing somethings ?
Regards
Yoann


Top
 Profile  
 
 Post subject: Re: HS - Optimize job only commited on sessionFactory close
PostPosted: Fri Sep 26, 2014 8:24 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
You're right, a commit would be needed too. I was expecting that since the IndexWriter flushes regularly - and especially if it's running out of buffer space - that this would be triggered by such a large rewrite. But committing would be better, as in case it's unnecessary
it would be a neglectable cost compared to the optimisation and previous commits.

Thanks, I opened https://hibernate.atlassian.net/browse/HSEARCH-1681

Could you confirm that the only problem is that queries don't take advantage from the optimised index until the first write happens?
I'm asking as I initially thought you were investigating a disk space problem, that's why I was explaining the separate file handles.

Keep in mind that since Hibernate Search 5 (Lucene 4), optimising is no longer recommended as the performance benefit is almost zero, while it also has to invalidate a number or caches (but the functionality is still available for those cases in which a read only index is hammered all day without further changes).

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: HS - Optimize job only commited on sessionFactory close
PostPosted: Fri Sep 26, 2014 10:05 am 
Newbie

Joined: Fri Apr 04, 2014 4:37 am
Posts: 9
Location: Orleans - France
I confirm. I update index after optimizing (purge of an not existing document) and next queries take advantage from optimized index. Here old segments files are not yet deleted but it dosen't matter for my case.

I'm surprised of what you said about optimize "performance benefit is almost zero".

I thought that searching on an optimized index was more performant (mime is 80M docs and queries are quite complex, executed in a massive way with spring-batch)

So I supposed I should have to optimize it, not often, but sometimes after a lot of updates (also executed in a massive way with spring-batch)


Regards
Yoann


Top
 Profile  
 
 Post subject: Re: HS - Optimize job only commited on sessionFactory close
PostPosted: Fri Sep 26, 2014 10:08 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Quote:
I'm surprised of what you said about optimize "performance benefit is almost zero".

That's true only since Lucene 4; with your version of Hibernate Search you're using Lucene 3 so it still is a good idea to optimise in your case.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 9 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.