-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 2 posts ] 
Author Message
 Post subject: MassIndexer threadsForIndexWriter
PostPosted: Tue May 11, 2010 5:09 pm 
Newbie

Joined: Wed Nov 18, 2009 7:00 pm
Posts: 12
This question is for Sanne,

In your MassIndexer you commented out the method threadsForIndexWriter(int); however I seen when you initialize the LuceneBatchBackend you hardcode the number of threads for "concurrent_writers" to 2. You also state that you've seen a performance gain in unusual setups. Can you elaborate on what you meant by "unusual"?

Also when running a profiler I see the entityloader threads sometimes spend a bulk of their time in waiting, what can I do to improve this?

Thanks,

newsosa


Top
 Profile  
 
 Post subject: Re: MassIndexer threadsForIndexWriter
PostPosted: Wed May 12, 2010 3:02 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Quote:
In your MassIndexer you commented out the method threadsForIndexWriter(int); however I seen when you initialize the LuceneBatchBackend you hardcode the number of threads for "concurrent_writers" to 2. You also state that you've seen a performance gain in unusual setups. Can you elaborate on what you meant by "unusual"?

yes this parameter is not currently configurable but we could enable it if we find a good use case.
By "unusual" I mean a situation in which the indexwriting and analyzing phases are way slower than the data loading for all entities and work preparation (bridges, text extraction). As the indexwriting phase is very fast, that would be a very unusual setup: I can hardly think of anybody spending a shitload of money on a super-grid database and then have no resources to afford a fast disk. It might happen in cases of very big PDF attachments, in this case the CPU cost of text analyzing could be high, but again I would expect that the cost of extracting the text from the PDF would be more expensive.
So in the end, the final stage is usually not the bottleneck; to be fair I could have hardcoded it to one thread, but two seems fine.

If your mileage varies much, please send feedback.

Quote:
Also when running a profiler I see the entityloader threads sometimes spend a bulk of their time in waiting, what can I do to improve this?

it might block for several reasons:
  • waiting for the database to return the entity during a query -> might need to add threads, but only possible if the database could handle it.
  • waiting because it finished, and is waiting for other threads to end
  • waiting because it's faster than the next element in the pipeline, and the pipeline is full. -> might need to add more threads to the next phase (threadsForSubsequentFetching(X)), or reduce the number of threads of entityloader (threadsToLoadObjects(Y))
  • in first seconds it's likely blocked waiting for the primary keys to be loaded

generally, it's ok that it blocks frequently as it serves as an active buffer between the other phases: while the model is fixed, the actual data fetched is varying so a single phase might be a bottlenek in some seconds, and be too fast (blocked waiting for others) in other seconds.
Threads are cheap, just make sure you have enough of them, not too many blocking as they're useless, and not killing 1) your database with too many concurrent requests 2)your memory with way too many threads.
If some threads block frequently, that's ok.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 2 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.