-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 11 posts ] 
Author Message
 Post subject: Side-effects of large batch-size (say 100).
PostPosted: Mon Nov 28, 2005 6:10 pm 
Newbie

Joined: Wed Nov 23, 2005 6:25 pm
Posts: 10
Location: Austin, Texas
Hi all,

Any reason why setting a large batch-size, say batch-size = 100, is particularly bad?

Everything I've read about batch-size on Entities or collections say to keep this number small, preferably under 10.

I have a worst-case scenario that returns hundreds (possibly 2 or 3 thousand) of Items. The average case is most likely under a hundred.

As you can see from the sample mappings below, each Item has two collections that are not lazy loaded. This results in an additional 2 queries per Item returned.

If there are M Items to return, and the batch-size is set to N on both collections, the number of queries is roughly 2*M/N. My current configuration confirms this.

What are the side-effects of specifying a large batch-size?? E.g. I'd like to set the batch-size = 100, so for our large users this would result in only a dozen queries at most. Any reason why this would be bad, besides memory considerations?

My gut tells me that this is a bad thing, but I can't think of a concrete reason (except for memory considerations and large generated SQL), why this would be bad.

Even if batch-size = 1, the heap still has to handle 2 thousand items.

The only other thing is that the generated SQL underneath could potentially be very big with all the added (alias0.id=?) OR (alias0.id=?) OR ....

Thanks!

Hibernate version: Hibernate 2.1.7

Mapping documents:
These are sample mappings.
Code:
<class name="Item" table="items" discriminator-value="4" polymorphism="explicit">
  <id name="id" column="id" access="field" unsaved-value="-2147483648">
    <generator class="identity"/>
  </id>
  <discriminator column="discriminator" type="integer"/>
  <property name="userName"/>
  <set name="dogs" table="mapItemToDogs"
       cascade="none" access="field" batch-size="10">
    <key column="itemID"/>
    <many-to-many class="Dog" column="dogId"/>
  </set>
  <set name="cats"
       cascade="all-delete-orphan" access="field" batch-size="10">
    <key column="itemID"/>
    <one-to-many class="Cat"/>
  </set>
</class>


Sample HQL query:
Code:
from Item i where i.userName = :userName


Database: SQLServer 2000


Top
 Profile  
 
 Post subject:
PostPosted: Mon Nov 28, 2005 6:14 pm 
Hibernate Team
Hibernate Team

Joined: Tue Aug 26, 2003 12:50 pm
Posts: 5130
Location: Melbourne, Australia
The SQL will be ugly, but otherwise there is probably no big problem.


But you really *should* explore other options like subselect fetching or join fetching.


Top
 Profile  
 
 Post subject: Thanks!!
PostPosted: Mon Nov 28, 2005 6:27 pm 
Newbie

Joined: Wed Nov 23, 2005 6:25 pm
Posts: 10
Location: Austin, Texas
gavin wrote:
The SQL will be ugly, but otherwise there is probably no big problem.


But you really *should* explore other options like subselect fetching or join fetching.


Thanks!! We're probably going to migrate to Hibernate3 in the next release and consider either subselect fetching or projection. For now, though in production, we have to live with Hibernate 2.1. We could left join fetch one collection, but two collections would be unwieldly, due to the cartesian product problem.

Also, for our particular case, the row being retrieved in "Item" has lots of columns (the above mappings are only sample mappings). Performing a left join fetch, even on one collection, would return a denormalized result set with Item being duplicated for each row in the eagerly fetched collection association.

If Hibernate does not complain much with a pretty large batch-size, I'd think I'd prefer that for now.

Initial testing confirms that batch-size="100" does not barf.

Thanks!


Top
 Profile  
 
 Post subject:
PostPosted: Tue Nov 29, 2005 9:10 am 
Regular
Regular

Joined: Sat Nov 05, 2005 5:33 am
Posts: 70
Location: Linz, Austria
By setting the log level to DEBUG, I found out that for a batch size of 10, 10 select statements are generated. The first one with 1 key condition. The second one with 2 or-ed key conditions. The last one with 10 or-ed key conditions.

Therefore, I expect that a batch size of 100 would result in 100 pregenerated selects.

_________________
hth,
Heinz
Don't forget to rate if this helped


Top
 Profile  
 
 Post subject: large-batch size worked!
PostPosted: Wed Nov 30, 2005 6:54 pm 
Newbie

Joined: Wed Nov 23, 2005 6:25 pm
Posts: 10
Location: Austin, Texas
hhuber wrote:
By setting the log level to DEBUG, I found out that for a batch size of 10, 10 select statements are generated. The first one with 1 key condition. The second one with 2 or-ed key conditions. The last one with 10 or-ed key conditions.

Therefore, I expect that a batch size of 100 would result in 100 pregenerated selects.


Turns out that setting the batch-size did work.

Our sample case of 1006 items (and 2012 collections), resulted in a total of only 53 queries. Not that far off from the theoretical expectation of 43!!
One query for the main load leaves 26 queries for each collection. That is 5 more than the expected 21, but I imagine that its just that the batch-size is not fully populated in some cases.

Bottom line is that 53 queries IS MUCH LESS THAN 2012 queries.

BTW, my numbers were not generated by DEBUG, but by setting show_sql = true. Could it be in your case that only the last one is used and the other nine are discarded?


Top
 Profile  
 
 Post subject: Re: large-batch size worked!
PostPosted: Thu Dec 01, 2005 3:13 am 
Regular
Regular

Joined: Sat Nov 05, 2005 5:33 am
Posts: 70
Location: Linz, Austria
MustangCoder wrote:
hhuber wrote:
By setting the log level to DEBUG, I found out that for a batch size of 10, 10 select statements are generated. The first one with 1 key condition. The second one with 2 or-ed key conditions. The last one with 10 or-ed key conditions.

Therefore, I expect that a batch size of 100 would result in 100 pregenerated selects.


Turns out that setting the batch-size did work.

Our sample case of 1006 items (and 2012 collections), resulted in a total of only 53 queries. Not that far off from the theoretical expectation of 43!!
One query for the main load leaves 26 queries for each collection. That is 5 more than the expected 21, but I imagine that its just that the batch-size is not fully populated in some cases.

Bottom line is that 53 queries IS MUCH LESS THAN 2012 queries.

BTW, my numbers were not generated by DEBUG, but by setting show_sql = true. Could it be in your case that only the last one is used and the other nine are discarded?


I never wanted to say that it wouldn't work. In the contrary.
The statements I talked about were not executed but were held in memory for future execution.
I just wanted to point out that with a batch size of 100, you have 100 pregenerated select statements. This can become a memory bottleneck (perhaps).

Regards,
Heinz


Top
 Profile  
 
 Post subject: Re: large-batch size worked!
PostPosted: Thu Dec 01, 2005 12:09 pm 
Newbie

Joined: Wed Nov 23, 2005 6:25 pm
Posts: 10
Location: Austin, Texas
hhuber wrote:
I never wanted to say that it wouldn't work. In the contrary.
The statements I talked about were not executed but were held in memory for future execution.
I just wanted to point out that with a batch size of 100, you have 100 pregenerated select statements. This can become a memory bottleneck (perhaps).


Thanks for the clarification. I guess this isn't too bad since if batch-size=1 it has to pregenerate those select statements anyway (e.g. similar memory foot-print).

Thanks for the info!


Top
 Profile  
 
 Post subject: Re: large-batch size worked!
PostPosted: Thu Dec 01, 2005 12:16 pm 
Hibernate Team
Hibernate Team

Joined: Tue Aug 26, 2003 12:50 pm
Posts: 5130
Location: Melbourne, Australia
hhuber wrote:
I never wanted to say that it wouldn't work. In the contrary.
The statements I talked about were not executed but were held in memory for future execution.
I just wanted to point out that with a batch size of 100, you have 100 pregenerated select statements. This can become a memory bottleneck (perhaps).


Actually it is less than 100: the batch sizes are 100, 50, 25, 12, 6, 3,2,1, so only 8 SQL statements are required. (This is done in anticipation of the problem you describe.)

MustangCoder wrote:
Thanks for the clarification. I guess this isn't too bad since if batch-size=1 it has to pregenerate those select statements anyway (e.g. similar memory foot-print).


Negative. Hibernate generates and caches just one SQL statement.


Top
 Profile  
 
 Post subject: Re: large-batch size worked!
PostPosted: Thu Dec 01, 2005 12:25 pm 
Regular
Regular

Joined: Sat Nov 05, 2005 5:33 am
Posts: 70
Location: Linz, Austria
MustangCoder wrote:
hhuber wrote:
I never wanted to say that it wouldn't work. In the contrary.
The statements I talked about were not executed but were held in memory for future execution.
I just wanted to point out that with a batch size of 100, you have 100 pregenerated select statements. This can become a memory bottleneck (perhaps).


Thanks for the clarification. I guess this isn't too bad since if batch-size=1 it has to pregenerate those select statements anyway (e.g. similar memory foot-print).

Thanks for the info!


I see, we're not on terms yet.

For batch-size=1, Hibernate pregenerates (and holds in memory) exactly one select statement with one key condition for this association. I have not investigated further whether this statement is held as a prepared statement or as a string.

For batch-size=10, it pregenerates (and holds in memory) 10 select statements. The first statement with one key condition, the second with two or-ed conditions, the third one with three or-ed conditions, ....
This way a pregenerated statement exists no matter how many instances (up to batch-size) have to be loaded.
At further run time no statements are generated. If 2 instances have to be selected, the second statement is used. If 10 instances have to be selected, the tenth statement is used.
Therefore, the memory foot-print does depend on your batch-size.

For batch-size=100, 100 statements are pregenerated for this association.
Therefore you need more memory if you increase the batch-size.
And if prepared statements are used, you also need the corresponding database resources.

Regards,
Heinz


Top
 Profile  
 
 Post subject:
PostPosted: Thu Dec 01, 2005 12:32 pm 
Hibernate Team
Hibernate Team

Joined: Tue Aug 26, 2003 12:50 pm
Posts: 5130
Location: Melbourne, Australia
Hibernate holds it as a string, the database connection pool will hold it as a PreparedStatement.

But again, the situation is not quite so bad as you say, because it is really only 8 statements, instead of 100.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Dec 05, 2005 7:55 am 
Regular
Regular

Joined: Sat Nov 05, 2005 5:33 am
Posts: 70
Location: Linz, Austria
gavin wrote:
Hibernate holds it as a string, the database connection pool will hold it as a PreparedStatement.

But again, the situation is not quite so bad as you say, because it is really only 8 statements, instead of 100.


Nice piece of information.
I read over your post the first time.

Keep up the good work (and support!),
Heinz


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 11 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.