Hibernate Books

All times are UTC - 5 hours [ DST ]



Post new topic Reply to topic  [ 13 posts ] 
Author Message
 Post subject: Hibernate Search: Query performance problem with iSeries
PostPosted: Fri Apr 20, 2012 12:12 pm 
Newbie

Joined: Fri May 13, 2011 1:01 pm
Posts: 16
Hi everyone,
I'm using Hibernate Search with Java 7 on iSeries: I've indexed text documents (that can reach 20 MB) and the index is on the iSeries file system. I've stored the documents in the index because I want to provide highlights with the Lucene Highlighter.

Searches appear to be very slow (25-30 sec.) if the result is a large document; I know that Java file system access can be slow on iSeries and I don't want to keep the index in memory: is there something I can configure in Hibernate Search to speed up the access to the index?
I've checked the manual (the section "3.10 Tuning Lucene indexing performance") but no one of the listed configurations seems to help in my case.

Thanks in advance,
Andrea


Top
 Profile  
 
 Post subject: Re: Hibernate Search: Query performance problem with iSeries
PostPosted: Mon Apr 23, 2012 6:08 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2296
Location: Third rock from the Sun
Hi Andrea,
I'm not familiar with the iSeries filesystem; is it a networked file system? That will be very slow indeed.

One thing you could do is us the Infinispan Directory; it was originally developed to provide clustering capabilities, but will work very well on a single node as well: you will be able to cache the index in memory but have it write-through to an Infinispan CacheLoader: you have options to configure such a CacheLoader to store it's data in many different stores, like for example (but not only) a JDBC datasource or a filesystem.

So your storage might be slow, but at least your queries will be very fast as it's fully cached in memory.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Hibernate Search: Query performance problem with iSeries
PostPosted: Mon Apr 23, 2012 11:47 am 
Newbie

Joined: Fri May 13, 2011 1:01 pm
Posts: 16
Hi Sanne,
the Java class, which uses Hibernate Search to perform queries, and the index are on the same iSeries server, so this is not a networking issue. I moved my search application on a Windows server and I've created the index over the same data:
  • queries on large documents (about 20MB) execute in less than a second on Windows (25-30 sec on iSeries)
  • queries on smaller documents have comparable execution times on Windows and iSeries

I've checked file system access' performance with a simple class that reads one of the index files and stores it in a byte array: reading a file of about 190MB requires comparable execution times on Windows and iSeries (about 0,5 sec). So the problem seems to be in the pure query execution process: I would say that keeping the index in memory would not help, what do you think?

Thank you,
Andrea


Top
 Profile  
 
 Post subject: Re: Hibernate Search: Query performance problem with iSeries
PostPosted: Mon Apr 23, 2012 12:06 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2296
Location: Third rock from the Sun
Quote:
I've checked file system access' performance with a simple class that reads one of the index files and stores it in a byte array: reading a file of about 190MB requires comparable execution times on Windows and iSeries (about 0,5 sec). So the problem seems to be in the pure query execution process: I would say that keeping the index in memory would not help, what do you think?


Agreed, I was assuming it was some sort of network mount.

Could you try different filesystem_access_type options for the Directory Provider as documented in http://docs.jboss.org/hibernate/search/4.1/reference/en-US/html_single/#search-configuration-directory ?

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Hibernate Search: Query performance problem with iSeries
PostPosted: Tue Apr 24, 2012 10:52 am 
Newbie

Joined: Fri May 13, 2011 1:01 pm
Posts: 16
sanne.grinovero wrote:
...

Could you try different filesystem_access_type options for the Directory Provider as documented in http://docs.jboss.org/hibernate/search/4.1/reference/en-US/html_single/#search-configuration-directory ?


I tried simple, mmap and nio for filesystem_access_type without significant differences (after every property update I've restarted my application and rebuilt the index).
Sorry I forgot to mention that I'm using Hibernate Search 3.4: I've tried to update to 4.1 but indexing is slower and doesn't stop (probably I have to tune some settings). I'll keep trying with HS 4.1: if it solves my performance problems I'll post the results, otherwise I will look for a different solution, "outside" Hibernate Search, e.g. not storing txt files' content in the index (giving up showing highligths) or putting this content in a different store (keeping the index as small as possible).

Sanne, thank you for your suggestions.

Andrea


Top
 Profile  
 
 Post subject: Re: Hibernate Search: Query performance problem with iSeries
PostPosted: Tue Apr 24, 2012 11:31 am 
Newbie

Joined: Fri May 13, 2011 1:01 pm
Posts: 16
Jova wrote:
... I'll keep trying with HS 4.1 ...


I've solved my problems with HS 4.1 migration and completed the test: no performance improvements. Now I will remove large content from the index.

All the best,
Andrea


Top
 Profile  
 
 Post subject: Re: Hibernate Search: Query performance problem with iSeries
PostPosted: Wed Apr 25, 2012 11:19 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2296
Location: Third rock from the Sun
Hi Andrea,
maybe it's worth using a profiler to understand why it is being slow?

You could also try the blackhole backend or the RAMDirectory, please post a performance comparison between blackhoke, ram, fs so I can suggest in which area to look.

On a different subject: yes you should always try to keep the index small. But this doesn't explain why it's fast on Windows but not on the iSeries.

Also, there is no need to store the text in the index; you could retrieve the text from the database (so without using projections).

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Hibernate Search: Query performance problem with iSeries
PostPosted: Thu Apr 26, 2012 11:03 am 
Newbie

Joined: Fri May 13, 2011 1:01 pm
Posts: 16
Hi Sanne,
thank you for your suggestions: I tried with the RAMDirectory and with the blackhole option (with FSDirectory) and the performance results didn't change from the original settings.

I've tried to make some profiling: the only tool I've been able to use (I got problems with hprof and instrumentation tools) is an iSeries command (ANZJVM) which takes snapshots of the JVM, with information about all the loaded classes at each snapshot: there's a very long output that I published as a Google document at

https://docs.google.com/document/d/19Es ... dCmjk/edit

(there's some italian but I know it's not a problem for you)
I've seen that java/lang/OutOfMemoryError class is loaded but in the log I can't see any exception/error thrown: do you think there can be a memory problem?

I've also run the search with TRACE logging enabled for org.hibernate classes) and the relevant output is
Code:
...
26 Apr 2012 10:27:22,483 -- [LoggerFactory] -- Opening IndexReader for directoryProviders: 1
26 Apr 2012 10:27:22,483 -- [LoggerFactory] -- Opening IndexReader from org.apache.lucene.store.NIOFSDirectory@/home/diap_d60/spool/control/dfind_idx/dfindv lockFactory=org.apache.lucene.store.SimpleFSLockFactory@673b6db8
26 Apr 2012 10:27:22,488 -- [LoggerFactory] -- Closing MultiReader: CacheableMultiReader(ReadOnlyDirectoryReader(segments_nr _h1(3.3):C550/279 _gy(3.3):c1 _gz(3.3):c1 _h0(3.3):C1 _h2(3.3):c1 _h3(3.3):c1 _h4(3.3):c1 _h5(3.3):c1 _h6(3.3):C1 _ib(3.3):c37 _q4(3.3):c253 _q5(3.3):c1 _q6(3.3):c1 _q7(3.3):c1 _q8(3.3):c1 _q9(3.3):c1 _qa(3.3):c1))
26 Apr 2012 10:27:22,488 -- [LoggerFactory] -- IndexReader closed.
26 Apr 2012 10:27:22,488 -- [LoggerFactory] -- Opening IndexReader for directoryProviders: 1
26 Apr 2012 10:27:22,488 -- [LoggerFactory] -- Opening IndexReader from org.apache.lucene.store.NIOFSDirectory@/home/diap_d60/spool/control/dfind_idx/dfindv lockFactory=org.apache.lucene.store.SimpleFSLockFactory@673b6db8
26 Apr 2012 10:27:49,713 -- [LoggerFactory] -- Closing MultiReader: CacheableMultiReader(ReadOnlyDirectoryReader(segments_nr _h1(3.3):C550/279 _gy(3.3):c1 _gz(3.3):c1 _h0(3.3):C1 _h2(3.3):c1 _h3(3.3):c1 _h4(3.3):c1 _h5(3.3):c1 _h6(3.3):C1 _ib(3.3):c37 _q4(3.3):c253 _q5(3.3):c1 _q6(3.3):c1 _q7(3.3):c1 _q8(3.3):c1 _q9(3.3):c1 _qa(3.3):c1))
26 Apr 2012 10:27:49,713 -- [LoggerFactory] -- IndexReader closed.
...


About the need to store the text in the index in order to provide highligths: reading the files from database, after the HS query before returning the results with the highlights, would be too slow, so at the moment we exclude this option.

Thanks
Andrea


Top
 Profile  
 
 Post subject: Re: Hibernate Search: Query performance problem with iSeries
PostPosted: Mon Apr 30, 2012 7:37 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2296
Location: Third rock from the Sun
Hi Andrea,
thanks for the document.

Quote:
I tried with the RAMDirectory and with the blackhole option (with FSDirectory) and the performance results didn't change from the original settings.

With blackhole I expect a strong performance improvement in any environment. Could you verify it is being enabled via debugging/logging/profiling ? I've just realized that in certain conditions the "blackhole" option is being ignored and no warning is given - now sure why, have to look into it.

Quote:
(there's some italian but I know it's not a problem for you)

Lol ! you're correct.

Quote:
I've seen that java/lang/OutOfMemoryError class is loaded but in the log I can't see any exception/error thrown: do you think there can be a memory problem?

If one happens in the backend workers, since they are asynchronous, we would not be able to "throw it back" to the application but it should at least be logged. You could also configure a custom error handler to make sure your application deals with these exceptional cases: http://docs.jboss.org/hibernate/search/ ... e/#d0e2443

I suspect the class is loaded as Lucene catches the error to cleanup some critical resources (it attempts to avoid corrupting the index if it happens).

What JVM are you using? Any chance you could try a different JVM ? as far as I remember IBM provides several JVMs, some are free some are commercial, it might be worth trying a different one.
Can you check used memory? The numbers from your document look like reasonable, but I don't know how much memory is being used/free in the JVM or if the operating system is swapping too much.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Hibernate Search: Query performance problem with iSeries
PostPosted: Wed May 02, 2012 9:34 am 
Newbie

Joined: Fri May 13, 2011 1:01 pm
Posts: 16
Hi Sanne,

sanne.grinovero wrote:
...
With blackhole I expect a strong performance improvement in any environment. Could you verify it is being enabled via debugging/logging/profiling ? I've just realized that in certain conditions the "blackhole" option is being ignored and no warning is given - now sure why, have to look into it.
...

Sorry probably I'm missing something: should I rebuild the index with "blackhole" enabled? I suppose no, since I've read it discards indexing changes, however in the log I see
...
02 May 2012 14:36:53,723 -- [LoggerFactory] -- initialized "blackhole" backend. Index changes will be prepared but discarded!
...

During search there aren't concurrent modifications to the index.

sanne.grinovero wrote:
Quote:
I've seen that java/lang/OutOfMemoryError class is loaded but in the log I can't see any exception/error thrown: do you think there can be a memory problem?

If one happens in the backend workers, since they are asynchronous, we would not be able to "throw it back" to the application but it should at least be logged. You could also configure a custom error handler to make sure your application deals with these exceptional cases: http://docs.jboss.org/hibernate/search/ ... e/#d0e2443

My problem is during the search phase, not while indexing is in progress: can I use a custom error handler also for the search phase?

sanne.grinovero wrote:
What JVM are you using? Any chance you could try a different JVM ? as far as I remember IBM provides several JVMs, some are free some are commercial, it might be worth trying a different one.

As far as we know the only available JVM for iSeries is the one shipped with the operating system.

sanne.grinovero wrote:
Can you check used memory? The numbers from your document look like reasonable, but I don't know how much memory is being used/free in the JVM or if the operating system is swapping too much.

I've made a simple memory profiling using Runtime.getRuntime().totalMemory() before and after the query: before the query totalMemory() is about 157MB and after the query is about 350 MB, with a maximum heap size of 1GB. Increasing the available heap size to 2 GB didn't change anything.
Now we've asked IBM support: we'll better check the JVM behavior and the availability of a different JVM. I'll let you know if we find a problem in the JVM settings.

Thank you,
Andrea


Top
 Profile  
 
 Post subject: Re: Hibernate Search: Query performance problem with iSeries
PostPosted: Mon May 07, 2012 4:34 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2296
Location: Third rock from the Sun
Hi Andrea,
we found a critical issue which I think could have been the cause for your problem as well: https://hibernate.onjira.com/browse/HSEARCH-1090

stay tuned for a release of 4.1.1 this week, or if you want you could build the master branch from the repository: the fix was committed already.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Hibernate Search: Query performance problem with iSeries
PostPosted: Sun May 13, 2012 7:57 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2296
Location: Third rock from the Sun
Hi Andrea,
any better luck with 4.1.1.Final ?

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Hibernate Search: Query performance problem with iSeries
PostPosted: Mon May 14, 2012 3:32 am 
Newbie

Joined: Fri May 13, 2011 1:01 pm
Posts: 16
sanne.grinovero wrote:
Hi Andrea,
any better luck with 4.1.1.Final ?


Hi Sanne,
I'm sorry: I didn't check the forum in the last days. You were right with your suggestion about trying a different JVM: we were using the "classic" JVM shipped with iSeries and the IBM support told us that there's also another JVM ("IBM Technology for Java Virtual Machine"): we've switched to this JVM and obtained a boost in query performances (the same query performances of Windows JVM).

My problem is now solved, however I'll try 4.1.1 Final asap (with both JVM) and let you know the results.

Thank you !

Andrea


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 13 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.