-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 5 posts ] 
Author Message
 Post subject: enabling async execution does not index all documents
PostPosted: Thu Mar 25, 2010 9:36 am 
Newbie

Joined: Tue May 06, 2008 3:35 pm
Posts: 7
Here is my setup currently;
1 Thread which has an open FullTextSession, and does something like this:
Code:
transaction = fts.beginTransaction();
fts.setCacheMode(CacheMode.IGNORE);
fts.setFlushMode(FlushMode.MANUAL);
while(true) {
  if(queue.peek() != null) {
    Object item = this.queue.poll();
    fts.lock(item,LockMode.NONE);
    fts.index(item);
    count++;
    if(count % BATCH_SIZE == 0) {
      fts.flushToIndexes();
      fts.clear();
    }
    //some code to break out of the loop
  }
}
fts.flushToIndexes();
transaction.commit();
fts.close;


I have a threadpool (ExecutorService) that is comprised of producer classes that do nothing other than retrieve objects and stick them into the queue (ConcurrentLinkedQueue) for the Indexer.

When I have async disabled, I get the correct document count when I open the index up in Luke. Using the same code and only changing the execution mode to async, the index builds with roughly half the document count it should have. I have that count printing out, so I know the indexer has passed all the objects to the index, so where is it getting dropped?

Any help would be appreciated! Turning async on, even though its only doing half the work, its easily 4-5x faster than with it off atm, so I very much would like to use it.


Top
 Profile  
 
 Post subject: Re: enabling async execution does not index all documents
PostPosted: Thu Mar 25, 2010 7:23 pm 
Newbie

Joined: Tue May 06, 2008 3:35 pm
Posts: 7
I figured it out.
Code:
//add
fts.setFlushMode(FlushMode.AUTO);
//remove
fts.flushToIndexes();


Fixed the async not writing out to the index, apparently you cannot control the flushing directly when its been handled off to other threads?

Now I am trying to make things faster. Right now im doing ~850 docs/s per index which is still about 1/10th of the speed I need at minimum. I see my fts thread count down having submitting the docs into the queue at a rate of about 10,000 docs/s, that thread dies and I see all the async threads running until completion. What im stuck with now is no matter what settings I change I cannot actually increase the index rate at all.

12gb vm on an 8 cpu box.

Code:
batch.ram_buffer_size 720
batch.merge_factor 10
worker.execution async
worker.thread_pool.size 128
worker.batch_size 1000


After the piling of objects through the fts.index() cpu usage drops to 8-10%, ram toggles between 4-6gb, and the disks are yawning.

Any ideas??


Top
 Profile  
 
 Post subject: Re: enabling async execution does not index all documents
PostPosted: Thu Mar 25, 2010 8:01 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
hi, I'm not sure of what your problem is; you should really use a profiler to ind out why it isn't using full CPU or IO capabilities.

just wondering, did you see my blogpost about MassIndexer? the design of the new MassIndexer looks quite similar to what you're implementing, maybe you can use that or look into the sources for some hints.

http://in.relation.to/13387.lace

It was build on experience and need for very fast reindexing, I've had speeds of around 10,000 docs/second on a simple MySQL database having lower specs as hardware - of course it highly depends on the complexity of your entities.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: enabling async execution does not index all documents
PostPosted: Fri Mar 26, 2010 10:58 am 
Newbie

Joined: Tue May 06, 2008 3:35 pm
Posts: 7
s.grinovero thanks for the reply!

We aren't able to move to 3.2 just yet, but I did model my indexing off of the new mass indexer. I sat Jprofiler ontop of my app this morning and watched everything fly through my code into hibernate/lucene, but I cannot find any reasons why the hibernate code wont go any faster.

I had another idea, do you have a sample set of code which you have a known indexing speed for that I can run? Something simple that I can baseline off of? Maybe if I approach my speed issue from a working set I can finally figure this thing out! :D

Thanks again.


Top
 Profile  
 
 Post subject: Re: enabling async execution does not index all documents
PostPosted: Fri Mar 26, 2010 11:30 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
so you are reading data from hibernate and writing it to the index? And you know it's the reads from hibernate which are slow?
In that case it definitely is the latency from each read, each time you ask for a new entity or have to initialize some lazily loaded collection the application will be waiting for an answer from the database; that explains why your application is basically idle. That's why in the MassIndexer the most parallelized task is the data-fetching; index writing is quite fast, in fact the current version contributed to 3.2 is not enabling more than a single thread for the IndexWriting stage, while I suggest dozens of threads for data reading.
Referring to the diagram: entity-loader and document-creator need many threads, while document-indexer is almost useless.

Image

Quote:
I had another idea, do you have a sample set of code which you have a known indexing speed for that I can run? Something simple that I can baseline off of? Maybe if I approach my speed issue from a working set I can finally figure this thing out! :D

No unfortunately I couldn't open source the real world applications - not even the model - on which I designed and tuned the MassIndexer.
It was also a problem as up until recently the Search sources weren't modular so contributing a huge performance test was nasty, but we just switched to modules to make sure we can include a performance testsuite.
Having such a reference would be very welcome, if you could contribute something like that I could help you.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 5 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.