-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 15 posts ] 
Author Message
 Post subject: Search performnce
PostPosted: Fri May 23, 2008 4:07 am 
Newbie

Joined: Wed May 25, 2005 10:53 pm
Posts: 13
Hi all,

I have the following Lucene query:

Code:

+((tt:"a java class"^20.0 ((tt:java* tt:class*)^18.0)) (dp:"a java class"^2.5 ((dp:java* dp:class*)^2.0)) (dD:"a java class"^1.8 ((dD:java* dD:class*)^1.4)) (tg:"a java class"^1.2 ((tg:java* tg:class*)^1.05))) +st:act +cT.tg.nd:0



The size of the index is 85MB. I use projection and just retrieve the "id" field for matching records. The no. of matching records can be more than 30k in some cases but I retrieve only the 100 records at a time.
The response time for tsuch a query is around ~30 seconds.
My machine configuration is as follows:
P4 2.66GHZ, 1.99GB RAM
The application runs on JBoss 4.2 with max heap size of 1024MB.

When 100 simultaneous threads are executing search, the response time goes upto 30-40 seconds. Also the CPU utilization is 100% and JBoss uses the entire heap memory available to it.

I want to know:
Is my configuration the right one for such a scenario? What can be a good configuration for such a scenario?
Are there any known ways in which Hibernate Search can help me in such a case?

One option I am working on is optimizing the Lucene index.

Any pointers will also be helpful.

Thanks,
Rakesh S



Need help with Hibernate? Read this first:
http://www.hibernate.org/ForumMailingli ... AskForHelp

Hibernate version:Core-33.2.6 Hibernate Search-3.0.1 Lucene:2.3.0

Mapping documents:

Code between sessionFactory.openSession() and session.close():

Full stack trace of any exception that occurs:

Name and version of the database you are using:

The generated SQL (show_sql=true):

Debug level Hibernate log excerpt:


Problems with Session and transaction handling?

Read this: http://hibernate.org/42.html


Top
 Profile  
 
 Post subject:
PostPosted: Fri May 23, 2008 4:31 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Quote:
I retrieve only the 100 records at a time.

but in 30 seconds you retrieve them all? how much?
Are you doing some changes on this entities during the loop?

Quote:
One option I am working on is optimizing the Lucene index.

Do you mean calling .optimize() or changing the internal structure?
You should definitely call optmize() before trying other solutions.

Quote:
tt:"a java class"

What do you mean with "java class"? Are you relying on the toString() methods to generate your queries?
Are you sure the 30 seconds are spent doing the query, not preparing it?

Quote:
Is my configuration the right one for such a scenario?

It looks good, others are experiencing very fast queries; there are many details of your configuration that should be checked however.

You should try printing out a query as a string (before execution), and past it into Luke to see how fast it goes there.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject:
PostPosted: Mon May 26, 2008 5:21 am 
Newbie

Joined: Wed May 25, 2005 10:53 pm
Posts: 13
[quote="s.grinovero"]
but in 30 seconds you retrieve them all? how much?
Are you doing some changes on this entities during the loop?


I use projection and hence get the top 100 ids(primary keys which are stored during indexing) as results.
I use
Code:
fullTextQuery.setMaxResults(noOfItems);



Do you mean calling .optimize() or changing the internal structure?
You should definitely call optmize() before trying other solutions.


After optimizing the index, I did see a performance gain but couple of seconds not much.

What do you mean with "java class"? Are you relying on the toString() methods to generate your queries?
Are you sure the 30 seconds are spent doing the query, not preparing it?


Yes 30 seconds are spent only executing the query:

Code:
// Instantiate Sessions
         SessionFactory sessionFactory = getSessionFactory();
         session = sessionFactory.openSession();
         FullTextSession fullTextSession = Search
               .createFullTextSession(session);

         org.hibernate.search.FullTextQuery fullTextQuery = (FullTextQuery) fullTextSession
               .createFullTextQuery(luceneQuery, clazz);
         fullTextQuery.setProjection("id");

         // Set Pagination related values
         fullTextQuery.setMaxResults(noOfItems);
         fullTextQuery.setFirstResult((paginationIndex - 1) * noOfItems);

         // Log the start time before search query is executed
         long startTime = System.currentTimeMillis();
         // Execute search
         List resultList = fullTextQuery.list();

         // Get the total result count
         int resultCount = fullTextQuery.getResultSize();

         // Log the stop time and total time for Lucene search
         long totalTime = System.currentTimeMillis() - startTime;


Note that I create a Hibernate Search Session for every request. Will this hamper performance in multi-threaded environment?

It looks good, others are experiencing very fast queries; there are many details of your configuration that should be checked however.

What are the other details that I should check?

I am executing 100 threads for 10 different queries (each query is run in 10 threads) using JMeter.

I have observed that as the no. of different queries increases, the response time of the Lucene query increases.

Any clues/hints/suggestions about this?


Top
 Profile  
 
 Post subject:
PostPosted: Mon May 26, 2008 10:57 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Could you please have your code print out the query and execute it in Luke? so you can confirm the problem is only lucene related and is not involving hibernate search.

Quote:
I am executing 100 threads for 10 different queries (each query is run in 10 threads) using JMeter.
I have observed that as the no. of different queries increases, the response time of the Lucene query increases.
Any clues/hints/suggestions about this?

did you configure Hibernate Search to use "shared readers" ?
shared readers scale much better and you don't need to open all files again for each transactions; also each IndexReader need to briefly lock your index when opening.
BTW, do you use transactions in each of your threads? It is recommended to always use them.

your query looks quite complicated, I think you should ask the lucene forum about that if you find out Hibernate is not involved.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject:
PostPosted: Mon May 26, 2008 11:08 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
double post; sorry

_________________
Sanne
http://in.relation.to/


Last edited by sanne.grinovero on Tue May 27, 2008 4:20 am, edited 1 time in total.

Top
 Profile  
 
 Post subject:
PostPosted: Tue May 27, 2008 1:31 am 
Newbie

Joined: Wed May 25, 2005 10:53 pm
Posts: 13
I executed one query from Luke that was part of the test plan. The time required to execute the query using Luke is 328ms as compared to 4593ms from my application which uses Hibernate Search. The no. of matching records in this case is 19851.

Also Luke response time will be much better as only one request can be fired at a time.

I am trying to improve performance in a multi-threaded environment.
100 simultaneous threads for 10 different queries are executed from a JMeter script.


did you configure Hibernate Search to use "shared readers" ?
shared readers scale much better and you don't need to open all files again for each transactions; also each IndexReader need to briefly lock your index when opening.


As per Hibernate search documentation, "shared readers" is the default strategy. I have not added any specific entry in the hibernate.cfg.xml for this. Also I want to know that when we mean readers are shared across queries, does it mean that even if a new Hibernate Search session is created the same shared index reader is used? Is the shared index reader for a given session or for a given SessionFactory?

BTW, do you use transactions in each of your threads? It is recommended to always use them.

Yes all my queries are executed inside transactions.


Top
 Profile  
 
 Post subject:
PostPosted: Tue May 27, 2008 4:10 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Quote:
Also I want to know that when we mean readers are shared across queries, does it mean that even if a new Hibernate Search session is created the same shared index reader is used? Is the shared index reader for a given session or for a given SessionFactory?

An indexreader peforms better when reused, as it can cache some stuff (such as open files) and is faster wen "warmed up", also it is threasafe. The downside is an indexreader doesn't "see" changes to the index, so we have to reopen it wen a change occurrs.
So Hibernate Search uses the same indexreader for all operations in the current Transaction, and also shares the indexreader with new transactions unless a change to index happened, then the new transaction will have to reopen a new reader.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject:
PostPosted: Tue May 27, 2008 4:17 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
BTW, if you could provide a testcase or some other code I would be glad to inspect the problem.
Could you try reproducing the problem on a simpler query?
Do you think this could be the same as http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-194 ?
If you could attach a test case to that bug it would be great.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject:
PostPosted: Tue May 27, 2008 4:26 am 
Newbie

Joined: Wed Apr 23, 2008 11:11 pm
Posts: 19
s.grinovero wrote:
BTW, if you could provide a testcase or some other code I would be glad to inspect the problem.
Could you try reproducing the problem on a simpler query?
Do you think this could be the same as http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-194 ?
If you could attach a test case to that bug it would be great.


I am the reporter of this bug and as far as I can see it matches my scenario perfectly.


Top
 Profile  
 
 Post subject:
PostPosted: Wed May 28, 2008 6:35 am 
Newbie

Joined: Wed May 25, 2005 10:53 pm
Posts: 13
Using the "not-shared" strategy did not work as it increased the response time.

What I observe is that when the 100 simultaneous requests are fired from JMeter, the CPU utilization is is between 80-100% for executing the queries. I am using JBoss 4.2 with max heap size of 1024MB.

I have also optimized the Lucene indexes.

Here is the code for hitting and getting the list of records from Lucene index directly. The results of running this follow the code. From this it is clear that plain Lucene is much faster. How cann I find out if there is a deadlock on SegmentReader?

Code:
/**
* May 24, 2008, 1:41:55 PM
*
* TestSearchPerformance.java
*/
package lucene.performance;

import java.io.IOException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.store.LockObtainFailedException;


/**
* Class to test search performance
* @author rakesh_shete
*
*/
public class TestSearchPerformance {
   public IndexSearcher indexSearcher = null;
   private static String queryString1 = null;
   private static String indexPath = null;
   private static String optimizedIndexPath = null;
   static{
      indexPath = new String("D:/QA_Build_20080515/LuceneIndexes/indexes");
      optimizedIndexPath = new String("D:/QA_Build_20080515_optimized/LuceneIndexes/indexes");
      queryString1 = new String("+((t:\"java boss\"^20.0 ((t:java* t:boss*)^18.0)) (d:\"java boss\"^2.5 ((d:java* d:boss*)^2.0)) (dd:\"java boss\"^1.8 ((dd:java* dd:boss*)^1.4)) (tg:\"java boss\"^1.2 ((tg:java* tg:boss*)^1.05))) +st:act +ntid:0");
   }
   
   public static void main(String[] args) throws CorruptIndexException, IOException {
      //optimizeIndexes();
      //testSearchPerformance(indexPath);
      testSearchPerformance(optimizedIndexPath);
      
   }
   
   /**
    * Function to test search performance
    */
   private static void testSearchPerformance(String indexpath) throws CorruptIndexException, IOException{
      TestSearchPerformance ts = new TestSearchPerformance();

      // Create an instance of IndexSearcher and use it in every search
      ts.indexSearcher = new IndexSearcher(indexpath);
      for (int i = 1; i <= 100; i++) {
         // Create a thread and invoke it
         SearcherThread searcherThread = ts.new SearcherThread(i, ts.queryString1, ts.indexSearcher);
         searcherThread.start();
      }
   }
   
   
   /**
    * Thread to execute search
    */
   private class SearcherThread extends Thread {
      
      private int threadId;
      private String queryString;
      private IndexSearcher indexsearcher;

      
      /**
       * Initialize with thread-id, querystring, indexsearcher
       */
      public SearcherThread(int threadId, String queryString, IndexSearcher indexSearcher) {
         this.threadId = threadId;
         this.queryString = queryString;
         this.indexsearcher = indexSearcher;
      }
      
      /**
       * @see java.lang.Runnable#run()
       */
      public void run() {
         
         try {
            QueryParser qp = new QueryParser("t",
                  new StandardAnalyzer());
            qp.setLowercaseExpandedTerms(true);

            // Parse the query
            Query q = qp.parse(queryString);
            if (q instanceof BooleanQuery) {
               ((BooleanQuery) q)
                     .setMaxClauseCount(Integer.MAX_VALUE);
            }
            
            long start = System.currentTimeMillis();
            // Search
            Hits hits = indexsearcher.search(q);
            long totalTime = System.currentTimeMillis() - start;
            System.out.println("[ Thread-id : " + threadId + " ] Total time taken for search is : " + totalTime + "ms with total no. of matching records : " + hits.length());
         }
         catch (ParseException e) {
            // TODO Auto-generated catch block
            System.out.println("[ Thread-id : " + threadId + " ] Parse Exception for queryString : " + queryString);
            e.printStackTrace();
         }
         catch (IOException e) {
            System.out.println("[ Thread-id : " + threadId + " ] IO Exception for queryString : " + queryString);
         }
      }
   }
   
   /**
    * Optimize the index
    */
   private static void optimizeIndexes(){
      try {
         IndexWriter writer = new IndexWriter(optimizedIndexPath, new StandardAnalyzer(), false);
         long startTime = System.currentTimeMillis();
         writer.optimize();
         long totalTime = System.currentTimeMillis() - startTime;
         System.out.println("Time taken for optimization: " + totalTime);
         writer.close();
      }
      catch (CorruptIndexException e) {
         // TODO Auto-generated catch block
         e.printStackTrace();
      }
      catch (LockObtainFailedException e) {
         // TODO Auto-generated catch block
         e.printStackTrace();
      }
      catch (IOException e) {
         // TODO Auto-generated catch block
         e.printStackTrace();
      }
   }

}



The results for 100 threads is as follows:
-------------------------------------------------

[ Thread-id : 95 ] Total time taken for search is : 1015ms with total no. of matching records : 428
[ Thread-id : 29 ] Total time taken for search is : 1172ms with total no. of matching records : 428
[ Thread-id : 31 ] Total time taken for search is : 1172ms with total no. of matching records : 428
[ Thread-id : 88 ] Total time taken for search is : 1047ms with total no. of matching records : 428
[ Thread-id : 89 ] Total time taken for search is : 1047ms with total no. of matching records : 428
[ Thread-id : 33 ] Total time taken for search is : 1156ms with total no. of matching records : 428
[ Thread-id : 12 ] Total time taken for search is : 1297ms with total no. of matching records : 428
[ Thread-id : 10 ] Total time taken for search is : 1297ms with total no. of matching records : 428
[ Thread-id : 53 ] Total time taken for search is : 1172ms with total no. of matching records : 428
[ Thread-id : 91 ] Total time taken for search is : 1094ms with total no. of matching records : 428
[ Thread-id : 54 ] Total time taken for search is : 1172ms with total no. of matching records : 428
[ Thread-id : 18 ] Total time taken for search is : 1266ms with total no. of matching records : 428
[ Thread-id : 48 ] Total time taken for search is : 1125ms with total no. of matching records : 428
[ Thread-id : 47 ] Total time taken for search is : 1172ms with total no. of matching records : 428
[ Thread-id : 87 ] Total time taken for search is : 1094ms with total no. of matching records : 428
[ Thread-id : 55 ] Total time taken for search is : 1172ms with total no. of matching records : 428
[ Thread-id : 40 ] Total time taken for search is : 1187ms with total no. of matching records : 428
[ Thread-id : 51 ] Total time taken for search is : 1156ms with total no. of matching records : 428
[ Thread-id : 52 ] Total time taken for search is : 1156ms with total no. of matching records : 428
[ Thread-id : 34 ] Total time taken for search is : 1187ms with total no. of matching records : 428
[ Thread-id : 74 ] Total time taken for search is : 1094ms with total no. of matching records : 428
[ Thread-id : 27 ] Total time taken for search is : 1203ms with total no. of matching records : 428
[ Thread-id : 59 ] Total time taken for search is : 1125ms with total no. of matching records : 428
[ Thread-id : 75 ] Total time taken for search is : 1094ms with total no. of matching records : 428
[ Thread-id : 90 ] Total time taken for search is : 1078ms with total no. of matching records : 428
[ Thread-id : 80 ] Total time taken for search is : 1094ms with total no. of matching records : 428
[ Thread-id : 92 ] Total time taken for search is : 1078ms with total no. of matching records : 428
[ Thread-id : 32 ] Total time taken for search is : 1187ms with total no. of matching records : 428
[ Thread-id : 37 ] Total time taken for search is : 1187ms with total no. of matching records : 428
[ Thread-id : 97 ] Total time taken for search is : 1062ms with total no. of matching records : 428
[ Thread-id : 45 ] Total time taken for search is : 1172ms with total no. of matching records : 428
[ Thread-id : 15 ] Total time taken for search is : 1266ms with total no. of matching records : 428
[ Thread-id : 20 ] Total time taken for search is : 1250ms with total no. of matching records : 428
[ Thread-id : 44 ] Total time taken for search is : 1172ms with total no. of matching records : 428
[ Thread-id : 16 ] Total time taken for search is : 1266ms with total no. of matching records : 428
[ Thread-id : 38 ] Total time taken for search is : 1187ms with total no. of matching records : 428
[ Thread-id : 39 ] Total time taken for search is : 1187ms with total no. of matching records : 428
[ Thread-id : 73 ] Total time taken for search is : 1094ms with total no. of matching records : 428
[ Thread-id : 66 ] Total time taken for search is : 1109ms with total no. of matching records : 428
[ Thread-id : 3 ] Total time taken for search is : 1297ms with total no. of matching records : 428
[ Thread-id : 2 ] Total time taken for search is : 1312ms with total no. of matching records : 428
[ Thread-id : 43 ] Total time taken for search is : 1172ms with total no. of matching records : 428
[ Thread-id : 86 ] Total time taken for search is : 1078ms with total no. of matching records : 428
[ Thread-id : 19 ] Total time taken for search is : 1250ms with total no. of matching records : 428
[ Thread-id : 83 ] Total time taken for search is : 1094ms with total no. of matching records : 428
[ Thread-id : 23 ] Total time taken for search is : 1250ms with total no. of matching records : 428
[ Thread-id : 35 ] Total time taken for search is : 1187ms with total no. of matching records : 428
[ Thread-id : 46 ] Total time taken for search is : 1156ms with total no. of matching records : 428
[ Thread-id : 14 ] Total time taken for search is : 1281ms with total no. of matching records : 428
[ Thread-id : 56 ] Total time taken for search is : 1125ms with total no. of matching records : 428
[ Thread-id : 22 ] Total time taken for search is : 1250ms with total no. of matching records : 428
[ Thread-id : 17 ] Total time taken for search is : 1266ms with total no. of matching records : 428
[ Thread-id : 49 ] Total time taken for search is : 1156ms with total no. of matching records : 428
[ Thread-id : 85 ] Total time taken for search is : 1094ms with total no. of matching records : 428
[ Thread-id : 13 ] Total time taken for search is : 1281ms with total no. of matching records : 428
[ Thread-id : 41 ] Total time taken for search is : 1187ms with total no. of matching records : 428
[ Thread-id : 42 ] Total time taken for search is : 1172ms with total no. of matching records : 428
[ Thread-id : 82 ] Total time taken for search is : 1094ms with total no. of matching records : 428
[ Thread-id : 99 ] Total time taken for search is : 1062ms with total no. of matching records : 428
[ Thread-id : 96 ] Total time taken for search is : 1062ms with total no. of matching records : 428
[ Thread-id : 94 ] Total time taken for search is : 1062ms with total no. of matching records : 428
[ Thread-id : 93 ] Total time taken for search is : 1078ms with total no. of matching records : 428
[ Thread-id : 72 ] Total time taken for search is : 1093ms with total no. of matching records : 428
[ Thread-id : 58 ] Total time taken for search is : 1109ms with total no. of matching records : 428
[ Thread-id : 4 ] Total time taken for search is : 1281ms with total no. of matching records : 428
[ Thread-id : 6 ] Total time taken for search is : 1281ms with total no. of matching records : 428
[ Thread-id : 64 ] Total time taken for search is : 1109ms with total no. of matching records : 428
[ Thread-id : 11 ] Total time taken for search is : 1265ms with total no. of matching records : 428
[ Thread-id : 9 ] Total time taken for search is : 1265ms with total no. of matching records : 428
[ Thread-id : 62 ] Total time taken for search is : 1109ms with total no. of matching records : 428
[ Thread-id : 63 ] Total time taken for search is : 1109ms with total no. of matching records : 428
[ Thread-id : 7 ] Total time taken for search is : 1281ms with total no. of matching records : 428
[ Thread-id : 65 ] Total time taken for search is : 1109ms with total no. of matching records : 428
[ Thread-id : 5 ] Total time taken for search is : 1281ms with total no. of matching records : 428
[ Thread-id : 1 ] Total time taken for search is : 1281ms with total no. of matching records : 428
[ Thread-id : 69 ] Total time taken for search is : 1093ms with total no. of matching records : 428
[ Thread-id : 57 ] Total time taken for search is : 1109ms with total no. of matching records : 428
[ Thread-id : 60 ] Total time taken for search is : 1109ms with total no. of matching records : 428
[ Thread-id : 8 ] Total time taken for search is : 1265ms with total no. of matching records : 428
[ Thread-id : 61 ] Total time taken for search is : 1093ms with total no. of matching records : 428
[ Thread-id : 68 ] Total time taken for search is : 1093ms with total no. of matching records : 428
[ Thread-id : 67 ] Total time taken for search is : 1093ms with total no. of matching records : 428
[ Thread-id : 70 ] Total time taken for search is : 1093ms with total no. of matching records : 428
[ Thread-id : 25 ] Total time taken for search is : 1187ms with total no. of matching records : 428
[ Thread-id : 30 ] Total time taken for search is : 1187ms with total no. of matching records : 428
[ Thread-id : 100 ] Total time taken for search is : 1046ms with total no. of matching records : 428
[ Thread-id : 71 ] Total time taken for search is : 1093ms with total no. of matching records : 428
[ Thread-id : 24 ] Total time taken for search is : 1203ms with total no. of matching records : 428
[ Thread-id : 28 ] Total time taken for search is : 1187ms with total no. of matching records : 428
[ Thread-id : 78 ] Total time taken for search is : 1078ms with total no. of matching records : 428
[ Thread-id : 79 ] Total time taken for search is : 1078ms with total no. of matching records : 428
[ Thread-id : 76 ] Total time taken for search is : 1078ms with total no. of matching records : 428
[ Thread-id : 21 ] Total time taken for search is : 1234ms with total no. of matching records : 428
[ Thread-id : 81 ] Total time taken for search is : 1078ms with total no. of matching records : 428
[ Thread-id : 84 ] Total time taken for search is : 1078ms with total no. of matching records : 428
[ Thread-id : 98 ] Total time taken for search is : 1046ms with total no. of matching records : 428
[ Thread-id : 77 ] Total time taken for search is : 1078ms with total no. of matching records : 428
[ Thread-id : 50 ] Total time taken for search is : 1140ms with total no. of matching records : 428
[ Thread-id : 36 ] Total time taken for search is : 1171ms with total no. of matching records : 428
[ Thread-id : 26 ] Total time taken for search is : 1172ms with total no. of matching records : 428


Top
 Profile  
 
 Post subject:
PostPosted: Wed May 28, 2008 6:38 am 
Newbie

Joined: Wed Apr 23, 2008 11:11 pm
Posts: 19
Thanks for putting a test case for this. I actually did the same thing and got the same result. Using lucene directly is much much much faster.

Right now we are using hibernate search to populate the indexes but we're using lucene directly for the search.


Top
 Profile  
 
 Post subject:
PostPosted: Thu May 29, 2008 10:49 am 
Newbie

Joined: Wed May 25, 2005 10:53 pm
Posts: 13
Thanks for the confirmation.

I guess I too have to go by a similar approach.

--Rakesh S


Top
 Profile  
 
 Post subject:
PostPosted: Fri May 30, 2008 5:20 pm 
Hibernate Team
Hibernate Team

Joined: Sun Sep 14, 2003 3:54 am
Posts: 7256
Location: Paris, France
As discussed on the JIRA issue, the unit test reaches a limit into Lucene perfs.
You can implement a pooled IndexReader strategy to work around the problem. You need to implement a ReaderProvider implementation (see SharedReaderProvider or NotSharedReaderProvider as example) and use hibernate.search.reader.strategy to define your ReaderProvider classname.

_________________
Emmanuel


Top
 Profile  
 
 Post subject:
PostPosted: Wed Jun 11, 2008 9:33 am 
Newbie

Joined: Wed May 25, 2005 10:53 pm
Posts: 13
Hi Emmanuel,

I have been going through the SharedReaderProvider implementation. It looks like it already pools indexreaders using the Map<DirectoryProvider, IndexReader> activeSearchIndexReaders. The openReader() does use it. I'll will enable trace level logs for my testing and check if the same indexreader is being used.

Could you give hints as to what different implementation should my custom reader provider have?

--Rakesh S


Top
 Profile  
 
 Post subject: Sample pooledsharedreaderprovider
PostPosted: Wed Jun 25, 2008 7:56 am 
Newbie

Joined: Wed May 25, 2005 10:53 pm
Posts: 13
Hi,

Here is a sample implementation of the pooled shared reader provider. Can anyone let me know if this will suffice or are the some things that I have missed?
The basic idea is to have a pool of index readers upfront at initialization time and distribute them among simultaneous threads.

Code:


import static org.hibernate.search.reader.ReaderProviderHelper.buildMultiReader;

import java.io.IOException;
import java.lang.reflect.Field;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import java.util.Set;
import java.util.concurrent.locks.Lock;
import java.util.concurrent.locks.ReentrantLock;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.MultiReader;
import org.hibernate.annotations.common.AssertionFailure;
import org.hibernate.search.SearchException;
import org.hibernate.search.engine.SearchFactoryImplementor;
import org.hibernate.search.reader.ReaderProvider;
import org.hibernate.search.store.DirectoryProvider;

/**
* TODO Documentation
*
* @author rakesh_shete
*
*/
public class PooledSharedReaderProvider implements ReaderProvider {

   private static Field subReadersField;

   private static Log logger = LogFactory
         .getLog(PooledSharedReaderProvider.class);

   /**
    * non fair list of locks to block per IndexReader only Locks have to be
    * acquired at least for indexReader retrieval and switch ie for all
    * activeSearchIndexReaders manipulation this map is read only after
    * initialization, no need to synchronize
    */
   private Map<DirectoryProvider, Lock> perDirectoryProviderManipulationLocks;

   private Map<DirectoryProvider, List<ReaderData>> searchIndexReaders = new HashMap<DirectoryProvider, List<ReaderData>>();

   private Map<IndexReader, DirectoryProvider> readerDirProvider = new HashMap<IndexReader, DirectoryProvider>();

   public IndexReader openReader(DirectoryProvider[] directoryProviders) {
      boolean trace = logger.isTraceEnabled();
      int length = directoryProviders.length;
      IndexReader[] readers = new IndexReader[length];
      if (trace)
         logger.debug("Opening IndexReader for directoryProviders: "
               + length);

      for (int index = 0; index < length; index++) {
         DirectoryProvider directoryProvider = directoryProviders[index];
         IndexReader reader;
         Lock directoryProviderLock = perDirectoryProviderManipulationLocks
               .get(directoryProvider);
         if (trace)
            logger.debug("Opening IndexReader from "
                  + directoryProvider.getDirectory().toString());
         directoryProviderLock.lock(); // needed for same problem as the
         // double-checked locking
         try {
            // Get an appropriate index reader
            reader = getIndexReader(directoryProvider);
         }
         finally {
            directoryProviderLock.unlock();
         }
         if (reader == null) {
            if (trace)
               logger.debug("No IndexReader "
                     + directoryProvider.getDirectory().toString());
         }
         readers[index] = reader;
      }

      // Build a multi-reader out of the given readers
      return buildMultiReader(length, readers);
   }

   public void closeReader(IndexReader reader) {
      boolean trace = logger.isTraceEnabled();
      if (reader == null)
         return;
      IndexReader[] readers;
      // TODO should it be CacheableMultiReader? Probably no
      if (reader instanceof MultiReader) {
         try {
            // Get the index readers from the multi-reader
            readers = (IndexReader[]) subReadersField.get(reader);
         }
         catch (IllegalAccessException e) {
            throw new SearchException(
                  "Incompatible version of Lucene: MultiReader.subReaders not accessible",
                  e);
         }
         if (trace)
            logger.debug("Closing MultiReader: " + reader);
      }
      else {
         throw new AssertionFailure(
               "Everything should be wrapped in a MultiReader");
      }

      for (IndexReader subReader : readers) {
         // Get the directory-provider for the given index reader
         DirectoryProvider dirProvider = readerDirProvider.get(subReader);

         if (dirProvider != null) {
            Lock directoryProviderLock = perDirectoryProviderManipulationLocks
                  .get(dirProvider);
            try {
               directoryProviderLock.lock();

               // Now decrement the thread-count for the given reader
               decrementIndexReaderCount(subReader, dirProvider);
            }
            finally {
               directoryProviderLock.unlock();
            }
         }

      }
   }

   public void initialize(Properties props,
         SearchFactoryImplementor searchFactoryImplementor) {
      if (subReadersField == null) {
         try {
            subReadersField = MultiReader.class
                  .getDeclaredField("subReaders");
            if (!subReadersField.isAccessible())
               subReadersField.setAccessible(true);
         }
         catch (NoSuchFieldException e) {
            throw new SearchException(
                  "Incompatible version of Lucene: MultiReader.subReaders not accessible",
                  e);
         }
      }
      Set<DirectoryProvider> providers = searchFactoryImplementor
            .getLockableDirectoryProviders().keySet();
      perDirectoryProviderManipulationLocks = new HashMap<DirectoryProvider, Lock>(
            providers.size());

      for (DirectoryProvider dp : providers) {
         perDirectoryProviderManipulationLocks.put(dp, new ReentrantLock());

         // Create a list of reader data objects. Each reader data has an
         // index reader, thread-count indicating the no. of threads using
         // the index reader and directory provider of the index reader
         List<ReaderData> readerDataList = getReaderData(dp);

         // Add the list to the map of directoryProvider-->readerDataList.
         // This will be used to get an appropriate index reader instance for
         // a given directoryProvider
         searchIndexReaders.put(dp, readerDataList);
      }
      perDirectoryProviderManipulationLocks = Collections
            .unmodifiableMap(perDirectoryProviderManipulationLocks);
   }

   private class ReaderData {

      public ReaderData(int threadCount, DirectoryProvider provider,
            IndexReader indexReader) {
         this.threadCount = threadCount;
         this.provider = provider;
         this.indexReader = indexReader;
      }

      /**
       * The no of threads using the index reader instance
       */
      public int threadCount;

      /**
       * The directory provider used by index reader
       */
      public DirectoryProvider provider;

      public IndexReader indexReader;
   }

   /**
    * Creates and returns a list of readerData objects each having a newly
    * opened index reader instance with thread-count as zero
    *
    * @param provider The directory-provider
    * @return The list of readerData
    */
   private List<ReaderData> getReaderData(DirectoryProvider provider) {
      List<ReaderData> readerDataList = new ArrayList<ReaderData>();
      try {
         for (int i = 0; i < 4; i++) {
            // Create an index reader instance for the given
            // directory-provider
            IndexReader reader = IndexReader.open(provider.getDirectory());
            // Create a reader data instance
            ReaderData readerData = new ReaderData(0, provider, reader);
            readerDataList.add(readerData);
            // Add the indexReader and directory-provider to the map of
            // indexReader-->directory-provider. This required to get the
            // directory-provider for a given indexReader instance using
            // which the readerDataList can be retrieved
            readerDirProvider.put(reader, provider);
         }

         return readerDataList;
      }
      catch (CorruptIndexException e) {
         logger.error(
               "Problems encountered while getting a reader insatnce", e);
      }
      catch (IOException e) {
         logger.error(
               "Problems encountered while getting a reader insatnce", e);
      }

      return null;
   }

   /**
    * Returns an index-reader instance from the readerDataList for the given
    * directory-provider such that the index-reader instance is being shared by
    * minimum no. of threads amongst other index-reader instances and is also
    * active
    *
    * @param provider The directory-provider
    * @return The index-reader instance
    */
   private IndexReader getIndexReader(DirectoryProvider provider) {
      List<ReaderData> readerDataList = searchIndexReaders.get(provider);

      // Make the first readerDataList as the default one from which
      // index-reader instance is to be returned
      ReaderData readerData = readerDataList.get(0);
      try {
         // If no thread is using the first index-reader instance after
         // incrementing its thread count
         if (readerData.indexReader.isCurrent()
               && readerData.threadCount == 0) {
            readerData.threadCount++;
            logger.debug("Returning first reader instance : "
                  + readerData.indexReader + " with threadCount value : "
                  + readerData.threadCount);
            return readerData.indexReader;
         }

         int[] staleIndexReaders = new int[3];
         int count = -1;

         // Now loop through the remaining list of readerDataList, find the
         // one shared by minimum no. of threads and is active. In the
         // process if any index-reader is stale, then, mark it for
         // deleting and adding a newly opened index-reader instead
         for (int i = 1; i < readerDataList.size(); i++) {
            ReaderData tempReaderData = readerDataList.get(i);

            // Check if the current readerData's index-reader instance is
            // up-to-date and if it's not being shared by any thread
            if (tempReaderData.threadCount == 0
                  && tempReaderData.indexReader.isCurrent()) {
               tempReaderData.threadCount++;
               logger.debug("Returning reader instance : "
                     + tempReaderData.indexReader
                     + " with threadCount value : "
                     + tempReaderData.threadCount);
               return tempReaderData.indexReader;
            }
            else {
               // Check if the current readerData's index-reader instance
               // is up-to-date
               if (tempReaderData.indexReader.isCurrent()) {
                  // Check and set the readerData with min. threadCount
                  // value
                  if (tempReaderData.threadCount <= readerData.threadCount) {
                     readerData = tempReaderData;
                  }
               }
               else {
                  // Mark the readerData as stale and the its indexReader
                  // needs to be replaced
                  if (tempReaderData.threadCount == 0) {
                     staleIndexReaders[++count] = i;
                  }
               }
            }

         }

         // Replace the stale indexReaders
         if (count > 0) {
            for (int i = 0; i < count; i++) {
               IndexReader reader = IndexReader.open(provider
                     .getDirectory());
               ReaderData newReaderData = new ReaderData(0, provider,
                     reader);
               // Remove the index-reader-->directoryProvider entry and
               // replace with the new one
               readerDirProvider.remove(readerDataList
                     .get(staleIndexReaders[i]).indexReader);
               readerDataList.remove(staleIndexReaders[i]);
               readerDataList.add(newReaderData);
               readerDirProvider.put(reader, provider);
               logger.debug("Added a new reader instance : " + reader
                     + " for a stale index at position : "
                     + staleIndexReaders[i]);
            }

         }

         // Update the thread count of the readerData whose index-reader
         // instance is to be returned
         readerData.threadCount++;
         logger.debug("Returning reader instance : "
               + readerData.indexReader + " with threadCount value : "
               + readerData.threadCount);
         return readerData.indexReader;
      }
      catch (Exception e) {
         logger
               .error("Exception while getting the index reader instance",
                     e);
      }

      return null;
   }

   /**
    * Decrements the the threadCount for the given indexReader instance
    *
    * @param reader The indexReader instance
    * @param dirProvider The directory-provider
    */
   private void decrementIndexReaderCount(IndexReader reader,
         DirectoryProvider dirProvider) {
      List<ReaderData> readerDataList = searchIndexReaders.get(dirProvider);
      for (ReaderData readerData : readerDataList) {
         if (readerData.indexReader.equals(reader)) {
            readerData.threadCount--;
            logger
                  .debug("Decremented threadCount count for reader instance : "
                        + reader
                        + " New threadCount count is : "
                        + readerData.threadCount);
            break;
         }
      }
   }
}




--Regards,
Rakesh S


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 15 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.