-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 13 posts ] 
Author Message
 Post subject: Batch insert very slow
PostPosted: Fri Jan 18, 2008 3:42 pm 
Newbie

Joined: Fri Jan 18, 2008 2:33 pm
Posts: 1
Hibernate version: 3.2.5

Mapping documents:

Code:

  <hibernate-mapping>
 
      <class name="database.models.Category" table="CATEGORIES" lazy="true">
          <id name="id" type="java.lang.Integer" column="id">
                <generator class="assigned"/>
          </id>
          <property name="name" type="java.lang.String" column="name" length="256" not-null="true"/>
          <property name="userDefined" type="java.lang.Boolean" column="userDefined"/>
          <property name="active" type="java.lang.Boolean" column="active"/>
          <set name="items" lazy="true" cascade="all" inverse="true">
             <key column="category" not-null="true"/>
             <one-to-many class="database.models.Item"/>
          </set>
      </class>
 
  </hibernate-mapping>


Code:
<hibernate-mapping>
   <class name="database.models.Item" table="ITEMS" lazy="true">
      <id name="id" type="java.lang.Integer" column="id">
            <generator class="assigned"/>
      </id>
      <property name="name" type="java.lang.String" not-null="true"/>
      <property name="modified" type="timestamp"/>
      <many-to-one name="category" column="category" class="database.models.Category" not-null="true"/>
   </class>
</hibernate-mapping>



Name and version of the database you are using:
Have tried the following databases:
Apache Derby 10.3.2.1
H2 1.0.65
HSQLDB 1.8.0.8

The generated SQL (show_sql=true):

18-Jan-2008 18:25:57 STDOUT: Hibernate: insert into ITEMS (name, modified, category, id) values (?, ?, ?, ?)

18-Jan-2008 18:25:57 STDOUT: Hibernate: insert into ITEMS (name, modified, category, id) values (?, ?, ?, ?)

etc., etc.,...

snip...


Algorithm:

Code:
1,000,000 XML ITEMS

Start Loop 1
   Get 1,000 XML ITEMS
   transaction = session.beginTransaction();
   Start Loop 2
      SAX Parse, creating ITEM objects
      if ( loopCount % 500 == 0 ) { //500, same as the JDBC batch size
          //flush a batch of inserts and release memory:
         session.flush();
         session.clear();
      }
   End Loop 2
   transaction.commit();
   session.clear();
End Loop 1


Hi,

I'm new to Hibernate and I'm currently experiencing a problem doing mass batch inserts into the database.

My application needs to insert around 1,000,000 records at a time, however, at present when I get to around 330,000 inserts, the insert rates drops dramatically.

I'm using the batching / transaction method detailed within the Hibernation reference manual, Chapter 13. Batch processing, but this is not helping :(

Also, I've tried the batching fix (incorporated from Hibernate 3.4) Optimize Hibernate for the bulk insertion of related entities, but specifying the hibernate.order_inserts property has not helped either.

In previous work with PHP / MySQL I have performed mass inserts to the database by creating long SQL INSERT instructions, i.e.

INSERT INTO table (id, name) VALUES (1, 'name1'), (2, 'name2')... etc.

The multi-instruction SQL statement method above improves insert performance by an order of magnitude! Does anybody know of a way I can get Hibernate to batch the inserts into a single SQL statement as opposed to multiple statements, say 500 at a time, not the full 1,000,000! :) ?

Also, ahem..., another problem :)
After getting 300,000 of the records into the database I restart my application, in order to try to insert the records that have not yet completed, but I then get a 'java.lang.OutOfMemoryError: Java heap space'. I'm sure this is something to do with the recursive nature of hibernate (i.e. pulling in all the Item records when loading the Category class), but I'm not sure how to fix it.

Any help appreciated!

Many Thanks,
Mark


Top
 Profile  
 
 Post subject: How to configure Hibernate so that batch insert will be on?
PostPosted: Tue Jan 22, 2008 3:47 pm 
Newbie

Joined: Tue Jan 22, 2008 3:30 pm
Posts: 1
Hi Mark and others,

I have added JDBC batch size to 20 (tried 30, 50, etc), and disabled the second lebvel cache in the hibernate configuration file. But I don't see any time save on saving 10000 simple or complicated objects than without using batch.

<!-- Enable JDBC batch updates -->

<!-- set optimistic locking strategy, default and strongly recommended to use version
however, if optimistic locking for version is used, JDBC batch updates will be
transparently disabled -->
<entry key="optimistic-lock">none</entry>

<entry key="hibernate.jdbc.batch_versioned_data">true</entry>

<entry key="hibernate.order_updates">true</entry>

<!-- Enable JDBC batch processing with resonable size>
<entry key="hibernate.jdbc.batch_size">30</entry>

<!-- Disable second level cache when using batch update -->
<entry key="hibernate.cache.use_second_level_cache">false</entry>

I also tried set hibernate.jdbc.batch_versioned_data = false.

But I always see in my Hibernate log:

JDBC batch updates for versioned data: disabled
Order SQL inserts for batching: disabled

Multiple things (such as the use of opimistic locking, identifier generator) could disable batch transparently. Are there other things would disable JDBC transparently?

Could you please share how you configure Hibernate so that JDBC batch insert is really enabled?

Thank you a lot for your help!

Guilian


Top
 Profile  
 
 Post subject:
PostPosted: Wed Feb 13, 2008 1:10 pm 
Newbie

Joined: Wed Feb 13, 2008 12:19 pm
Posts: 2
I also experimented your algorithm pattern and one thing that can speed-up the batch loading is to avoid adding to memory-based collections.

Example:

if you batch insert tons of individuals that are related to a "master" thru OneToMany association
class Master {
List<Individual> itsIndividuals ;
}
don't call "itsIndividuals.add(aNewIndividual)" during the batch insert if you don't need to… that may speed-up the process a lot (about 30x in my case…
and much more if the collection is a "Set<Individual>"…)

To handle large collection of records, I use my own "paginated-list" collection subclass that "evict" individuals when flushed-out. (This class needs access to the DAO layer…). Good for sequential access… much less for random ones…

You could also use "filters" as explained in the Hibernate reference doc (page 181 of PDF Hibernate Ref. Doc. Version 3.2 cr1)


Top
 Profile  
 
 Post subject:
PostPosted: Wed Feb 13, 2008 1:34 pm 
Expert
Expert

Joined: Sat Jan 17, 2004 2:57 pm
Posts: 329
Location: In the basement in my underwear
For the out of memory issue, turn off query cache if you're using it. Otherwise, the ActionQueue never empties even after doing a transaction flush/clear. It will also cause any of your entities to remain in memory since the ActionQueue holds references to them.

I can't remember if that has a drastic performance impact as well but it might.

_________________
Some people are like Slinkies - not really good for anything, but you still can't help but smile when you see one tumble down the stairs.


Top
 Profile  
 
 Post subject: Yes this was really helpful. Thank you!
PostPosted: Mon Mar 24, 2008 9:35 am 
Newbie

Joined: Mon Mar 24, 2008 9:30 am
Posts: 1
[quote="VampBoy"]For the out of memory issue, turn off query cache if you're using it. Otherwise, the ActionQueue never empties even after doing a transaction flush/clear. It will also cause any of your entities to remain in memory since the ActionQueue holds references to them.

I can't remember if that has a drastic performance impact as well but it might.[/quote]


Thanks a lot VampBoy!!


Top
 Profile  
 
 Post subject:
PostPosted: Fri Jul 18, 2008 3:02 pm 
Newbie

Joined: Fri Jul 18, 2008 1:47 pm
Posts: 4
hello *!

i'm fairly new to hibernate and (go figure..) i have a problem, and namely, the same one the first poster had:

i'm inserting a large number (several millions) of relatively simple records into a table, and hibernate insists in dispatching each record within it's own insert sql-statement, like

[java] Hibernate: insert into Node () values ()
[java] Hibernate: insert into Node () values ()
[java] Hibernate: insert into Node () values ()
[...]

instead of batching several records into one sql-statement, something like:

[java] Hibernate: insert into Node () values () () ()

is hibernate even able to do this, or am i trying something that's not possible, anyway? i have read about batching operations in hibernate in the manual, on the internet, etc., but nothing i did helped.

my hibernate.cfg.xml looks like this:

Code:
   
    <property name="hibernate.connection.autocommit">false</property>

    <property name="hibernate.jdbc.batch_size">5</property>
    <property name="hibernate.jdbc.fetch_size">5</property>
    <property name="hibernate.jdbc.batch_versioned_data">true</property>

    <property name="hibernate.order_inserts">true</property>
    <property name="hibernate.order_updates">true</property>

    <property name="hibernate.cache.use_query_cache">false</property>
    <property name="hibernate.cache.use_second_level_cache">false</property>


as you can see, i've set the jdbc batch size, asked for ordered inserts, and disabled the level two cache. according to the manual, this should do the trick. id doesn't. i also read, that using generated identities may transparently disable batching, so i set my own ids. it doesn't work either. the properties have been set correctly - i can see this in the log.

my code looks like this:

Code:
    for (int i = 1; i <= max; i++)
    {
      Node n = new Node();
      n.setVId(id++);
      n.setGraph(g);
      session.save(n);
      if (id % 5 == 0)
        session.flush();
    }


the separate inserts _are_ being dispatched together (upon flush), just not in _one_ sql statement.

the node class is annotated like this:

Code:
@Entity
public class Node
{
  @Id
  private long vId;

  @OneToMany(mappedBy = "source")
  private Collection<Edge> isSourceOf;

  @OneToMany(mappedBy = "target")
  private Collection<Edge> isTargetOf;

  @ManyToOne(fetch = FetchType.LAZY)
  private Graph graph;


does anyone have any suggestion what i could try? i'm using hibernate v3.2 with a mysql database and j2ee 1.5. could it be the mysql jdbc driver that's causing the problem? who generates the sql code anyway? hibernate or mysql?

i suppose someone must have managed to get batching to run correctly. could he/she, please, share his/her wisdom?

tank you!

best regards,
raul


Top
 Profile  
 
 Post subject:
PostPosted: Sun Jul 20, 2008 4:18 am 
Beginner
Beginner

Joined: Wed Sep 21, 2005 8:18 am
Posts: 31
Hi rfechete! Try batch size of 50. It gives maximum performance for batch insertion. If you still have some problems, kindly look at this blog:

http://javainnovations.blogspot.com/2008/07/batch-insertion-in-hibernate.html

_________________
amer sohail


Top
 Profile  
 
 Post subject:
PostPosted: Sun Jul 20, 2008 1:57 pm 
Newbie

Joined: Fri Jul 18, 2008 1:47 pm
Posts: 4
hi, amer,

and thank you for you answer!

amersohail794 wrote:
Try batch size of 50. It gives maximum performance for batch insertion. If you still have some problems, kindly look at this blog:[..]


i read your blog entry and i am doing exactly that -- i'm flushing the session every hibernate.jdbc.batch_size items. the problem is, however, that instead of executing one (1) sql statement like this:

Code:
insert into table values (value1) (value2) (value3) ... (valueN)


hibernate executes N insert calls, one after the other, like this:

Code:
insert into table values (value1)
insert into table values (value2)
...
insert into table values (valueN)


i was hoping someone might have an idea why hibernate is executing single-value inserts instead of "compressing" all the values in a batch into one single insert statement.

i hope i didn't get it completely wrong, what batching is all about.

thanks!

best regards,
raul


Top
 Profile  
 
 Post subject:
PostPosted: Mon Jul 21, 2008 5:09 am 
Beginner
Beginner

Joined: Wed Sep 21, 2005 8:18 am
Posts: 31
Well Hibernate does not support that type of batch insertion.

_________________
amer sohail


Top
 Profile  
 
 Post subject:
PostPosted: Mon Jul 21, 2008 5:57 am 
Newbie

Joined: Fri Jul 18, 2008 1:47 pm
Posts: 4
amersohail794 wrote:
Well Hibernate does not support that type of batch insertion.


umm.. are you sure? if hibernate cannot construct queries like "insert ... values () () ()", then this whole batch-insertion business doesn't really make much sense (!?).. you still have N queries for N entries, so i don't see where the extra performance should come from.. i get somewhere around 4-5k insert queries per minute and that's awful considering the fact that i have a couple of million entries to process..

thank you,
raul


Top
 Profile  
 
 Post subject:
PostPosted: Mon Jul 21, 2008 6:33 am 
Beginner
Beginner

Joined: Wed Sep 21, 2005 8:18 am
Posts: 31
Well. I am not 100% sure but at least 90% sure. Reason is that bulk data is not recommended to enter through insert statements specially if you do need it quite often. Each DBMS provides special tehniques to handle it. e.g. in case of Mysql, there is "Load Data infile" command which is 20 times more faster than insert statement. That'y it is always recommended to do it through DBMS directly. My recommendation is create StoredProcedure and call that procedure through hibernate. Inside Stored procured you will do batch insertion with respect to DBMS, so you will get performance boost.

_________________
amer sohail


Top
 Profile  
 
 Post subject:
PostPosted: Mon Jul 21, 2008 1:41 pm 
Newbie

Joined: Fri Jul 18, 2008 1:47 pm
Posts: 4
oh well.. i couldn't get hibernate to make bulk inserts, so i decided to go down and dirty and put my own bulk sql queries together. result:

hibernate w/o bulk: 7.500 entries / 1.5 min
hand coded sql w/ bulk: 1.000.000 entries / 1.5 min

[edit]

ok, i found something interesting:
flushing the session alone, does not improve performance, clearing the session after the flush, however, gives the process an incredible speed boost:

hibernate w/ batching (50) and session clearing, w/o bulk inserts: 428.000 entries / 1.5 min

on the same machine.

[/edit]

i would much rather use hibernate, but the numbers are somewhat against it..

best regards,
raul


Top
 Profile  
 
 Post subject:
PostPosted: Mon Aug 04, 2008 5:00 pm 
Newbie

Joined: Fri Nov 17, 2006 5:01 pm
Posts: 1
Raul,

Have you tried using the "rewriteBatchedStatements" MySQL JDBC driver setting? It causes the driver to internally take batches of INSERT statements and "collapse" them into a single one.

I did some tests using Hibernate and MySQL Proxy to view the traffic and it seems to do exactly what you want.

-- Matt


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 13 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.