-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 
Author Message
 Post subject: Hibernate Search multi-threaded performance
PostPosted: Fri Oct 19, 2012 10:51 am 
Newbie

Joined: Tue Sep 28, 2010 9:20 am
Posts: 6
Hi there,

I have a long-running process where I run multi-threaded searches on the index.

I noticed by profiling my code using visualvm that I spend almost 90% of my time in that stack:

Code:
org.hibernate.search.jpa.impl.FullTextQueryImpl.getResultList()
   org.hibernate.search.query.hibernate.impl.FullTextQueryImpl.list()
      org.hibernate.search.query.engine.impl.HSQueryImpl.queryEntityInfos()
         org.hibernate.search.query.engine.impl.HSQueryImpl.buildSearcher()
            org.hibernate.search.query.engine.impl.HSQueryImpl.buildSearcher()
               org.hibernate.search.reader.impl.MultiReaderFactory.openReader()
                  org.hibernate.search.indexes.impl.SharingBufferReaderProvider.openIndexReader()
                     org.hibernate.search.indexes.impl.SharingBufferReaderProvider$PerDirectoryLatestReader.refreshAndGet()
                        java.util.concurrent.locks.ReentrantLock.lock()


Am I doing something wrong or is there a way to optimize that?

I'm using HSearch 4.1.1.Final with Lucene 3.5.0.

Thanks!


Top
 Profile  
 
 Post subject: Re: Hibernate Search multi-threaded performance
PostPosted: Fri Oct 19, 2012 1:44 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi,
what do you mean by multi-threaded searches?

That lock is going to protect your threads from initializing the same IndexReader instance multiple times, as that requires a lot of disk IO. So the lock makes sure all of your threads (except one) wait for the one to load/refresh the index segments, so then they can all benefit from the same load operation. We could remove the lock and then the profiler will tell you that Hibernate Search is super light and doing nothing as all your VM would be waiting ages for disk activity ;-)

If you see this often, I can think this could be the causes:
- a very very large index is being opened and you're optimizing too often, consider disabling optimization altogether
- you're index is extremely slow - like you're having just a couple of documents to run your test; this could make all search operations very fast and highlight this contention point
- you're not only searching but also writing a lot to the index: every time a write is performed the IndexReader cache is invalidated and the instance needs to hit the disk again to refresh very often, preventing other threads to acquire the lock. Did you try the NRT IndexManager?

I just created https://hibernate.onjira.com/browse/HSEARCH-1223 as there's a new trick we can apply with Lucene 3.6 but I still think your case is so extreme that it wouldn't solve it. If you are able to reproduce this in a simplified example of your app (possibly using Maven and github) I would be glad to look into your case.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Hibernate Search multi-threaded performance
PostPosted: Fri Oct 19, 2012 3:57 pm 
Newbie

Joined: Tue Sep 28, 2010 9:20 am
Posts: 6
Hi Sanne, again, thanks for your prompt reply.

Maybe some more background info would be useful.

By multi-threaded search, I mean that I launch around 8 threads which will execute around 30k searches in total.

As I have an abstraction around the underlying search engine, I will execute 30k times the following code:

Code:
FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(entityManager);
FullTextQuery persistenceQuery = fullTextEntityManager.createFullTextQuery(query);
List results = persistenceQuery.getResultList();
int totalCount = persistenceQuery.getResultSize();
return results


I'm assuming doing that will not cause an open/close operation on the underlying reader but I might be wrong.

To your points:

If you see this often, I can think this could be the causes:

- a very very large index is being opened and you're optimizing too often, consider disabling optimization altogether

My index consist of 43 entities, but only 3 or 4 are searched on. There are around 100k entities for the largest one. The total disk size is around 400M so I don't think I qualify as very very large, but most of that 400M is in the entity I'm querying.
I don't do anything in particular in regards to optimization, but just in case, how can I disable it?

- you're index is extremely slow - like you're having just a couple of documents to run your test; this could make all search operations very fast and highlight this contention point

See above.

- you're not only searching but also writing a lot to the index: every time a write is performed the IndexReader cache is invalidated and the instance needs to hit the disk again to refresh very often, preventing other threads to acquire the lock. Did you try the NRT IndexManager?

I do write on the index, but much less frequently than I search. Furthermore, the write operations are on another entity than the ones I search on. I assume this should not trigger a close operation on the other readers.

I will try to make a simplified sample and I did not try NRT.

On another note, I had a question about the async operation. Part of my index is geo data (Think hierarchy of country/state/county/cities). As a result I have a lot of @ContainedIn back references.
When I add a data point to a country, it triggers the re-indexing of related objects (cities, states, counties), which is expected. In the case of a country with a large amount of cities, it can take a while.

I switched on the async operation mode, hoping not to lock the UI in the process:

Code:
hibernate.search.default.worker.execution = async


However, it seems that the async part is only async to execute the work, not in the prepare phase. See below profiler stack:

Code:
org.hibernate.search.backend.impl.PostTransactionWorkQueueSynchronization.beforeCompletion()
   org.hibernate.search.backend.impl.BatchedQueueingProcessor.prepareWorks()
      org.hibernate.search.backend.impl.WorkQueue.prepareWorkPlan()
         org.hibernate.search.engine.impl.WorkPlan.getPlannedLuceneWork()
            org.hibernate.search.engine.impl.WorkPlan$PerClassWork.enqueueLuceneWork()
               org.hibernate.search.engine.impl.WorkPlan$PerEntityWork.enqueueLuceneWork()
                  org.hibernate.search.engine.spi.DocumentBuilderIndexedEntity.addWorkToQueue()
                     org.hibernate.search.engine.spi.DocumentBuilderIndexedEntity.createUpdateWork()
                        org.hibernate.search.engine.spi.DocumentBuilderIndexedEntity.getDocument()
                           org.hibernate.search.engine.spi.DocumentBuilderIndexedEntity.buildDocumentFields() <== 145s
                              org.hibernate.collection.internal.PersistentList.iterator() <== 92s
                              org.hibernate.collection.internal.PersistentBag.iterator() <== 36s
                              org.hibernate.search.engine.spi.DocumentBuilderIndexedEntity.buildDocumentFields() <== 16s


The prepare phase takes around 145s blocking the UI (updating a data point the the country USA, containing 15k cities...), loading all the lazy collections.
I assume the prepare phase cannot be done in the background?


Top
 Profile  
 
 Post subject: Re: Hibernate Search multi-threaded performance
PostPosted: Fri Oct 19, 2012 6:15 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Quote:
[code...]
I'm assuming doing that will not cause an open/close operation on the underlying reader but I might be wrong.

Let me clarify what that triggers: it's a single IndexReader refresh which is going to be used for both .list and .size operations. This refresh operation will check if the latest opened IndexReader is still up to date, and if it is the same instance will be reused; specifically this check operation is an IO operation: it's a very cheap one but still an IO operation so it is being guarded by the lock you see in your profiler. If the IndexReader is not up to date, then it's refreshed: this means the lock will need to be held for a little bit longer, the time needed to perform the refresh. If the index is found to not need any update, it's immediately returned releasing the lock.

The cost of a refresh operation depends on the amount of index segments which need to be reloaded: each index is made of a set of immutable segments, when you write it's possible that either a new small segment is added to the list, or some segments are merged in a new one (discarding the old segments). This needs to happen regularly (controlled & tunable by the MergePolicy), and is forced by the index optimization process (which compacts all segments in one).

Consequentially if you merge or optimize too frequently the amount of segments which need to be reloaded during the time you have that lock is larger, and readers are blocked. You can tune the frequency of segments compaction (link below).

Quote:
My index consist of 43 entities, but only 3 or 4 are searched on. There are around 100k entities for the largest one. The total disk size is around 400M so I don't think I qualify as very very large, but most of that 400M is in the entity I'm querying.
I don't do anything in particular in regards to optimization, but just in case, how can I disable it?


No you're right: that index is very reasonable.
On optimization, just make sure you disable this (it was critically important to do on older Lucene version, not so much these days and likely not adviseable at all in your case):
http://docs.jboss.org/hibernate/search/4.2/reference/en-US/html_single/#search-optimize

Quote:
I will try to make a simplified sample and I did not try NRT.

Definitely try NRT. No locks at all, and no need to reload those segments as it just reuses the same buffers from the IndexWriter. Please let me know about the difference in performance.

Both in case you keep the current IndexReader or if you use NRT, it might be useful to tune the parameters to allow for more and smaller segments:
http://docs.jboss.org/hibernate/search/4.2/reference/en-US/html_single/#lucene-segment-size

Quote:
However, it seems that the async part is only async to execute the work, not in the prepare phase.

That's correct. We only decouple the writing on index: the IO which is usually the slower factor.

Quote:
The prepare phase takes around 145s blocking the UI (updating a data point the the country USA, containing 15k cities...), loading all the lazy collections.
I assume the prepare phase cannot be done in the background?

No that's not possible as it wouldn't be able to load the other required information in the scope of the same transaction. Still this usually doesn't take that much time, you should be able to stay in the "10 milliseconds" range by tuning the fetching strategies of Hibernate, and possibly enabling a second level cache so that you don't need to reload frequently used data at all. Correctly configured caches like Ehcache or Infinispan can deal with some million transactions per second... and the more your cached data is immutable the more they are effective. I guess countries and states don't change shape/names too much nowadays ;-)

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.