-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 17 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: Optimizing bulk inserts
PostPosted: Fri Nov 07, 2003 2:38 pm 
Newbie

Joined: Tue Nov 04, 2003 12:41 pm
Posts: 6
Hi,

I have a single threaded application that reads ascii data from a flat file, does some processing and inserts records into the DB using Hibernate. The record has close to 200 fields that are stored in columns in a single table. A unique PK is generated by Hibernate for each record from an Oracle sequence, the mapping is defined in the hbm.xml.

I am getting a performance of 40 min for 100K records. I commit after every 100 save(). I am using only default Hibernate properties. How can I improve the performance?

I ran this app with the profiler turned on, the profiler output shows bulk of the time is spent in Object.wait(). Where is this happening? There is no stack trace available for the call. The profiler complained about methods not available at the top of the stack. Is this because the byte code is modified by cglib? Prof output:

Code:
TRACE 8889:
        java.lang.Object.wait(Object.java:Unknown line)
        java.lang.Object.wait(Object.java:Unknown line)
TRACE 8888:
        java.lang.ref.ReferenceQueue.enqueue(ReferenceQueue.java:Unknown line)
TRACE 8890:
        java.lang.Object.wait(Object.java:Unknown line)
        java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:Unknown line)
        java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:Unknown line)
CPU TIME (ms) BEGIN (total = 947706) Fri Nov  7 11:45:14 2003
rank   self  accum   count trace method
   1 23.63% 23.63%       3  8889 java.lang.Object.wait
   2 23.63% 47.26%       3  8890 java.lang.Object.wait
   3  1.98% 49.24% 2461438  1007 net.sf.hibernate.impl.SessionImpl.updateReachable
   4  1.86% 51.10% 2461438  2162 net.sf.hibernate.impl.SessionImpl.wrap
   5  1.11% 52.21% 2461438  7338 net.sf.hibernate.type.AbstractType.isDirty
   6  1.06% 53.27% 2403820  8593 net.sf.hibernate.type.StringType.equals
   7  1.02% 54.29%    4226  7689 java.net.SocketInputStream.socketRead0
   8  0.85% 55.14%   11597  2181 net.sf.hibernate.impl.SessionImpl.updateReachables
   9  0.78% 55.93%  192000  6142 java.lang.Class.getMethod0
  10  0.71% 56.64%    1000  3480 com.sbc.orderprocessor.asiadapter.ASIFeedParser.createInstance
  11  0.64% 57.27%   11597  3353 net.sf.hibernate.impl.SessionImpl.wrap
  12  0.62% 57.89% 2265938  5521 net.sf.hibernate.type.AbstractType.isComponentType
  13  0.62% 58.51% 2265938  8629 net.sf.hibernate.type.AbstractType.isPersistentCollectionType
  14  0.61% 59.12% 2265938  7339 net.sf.hibernate.type.AbstractType.isPersistentCollectionType
  15  0.61% 59.73% 2403820  4524 org.apache.commons.lang.ObjectUtils.equals
  16  0.61% 60.34% 2265938  4654 net.sf.hibernate.type.AbstractType.isComponentType
  17  0.59% 60.93% 1003278  3252 java.lang.String.indexOf
  18  0.58% 61.51%  272606  4914 java.util.StringTokenizer.scanToken


Regards,
Vaishali


Top
 Profile  
 
 Post subject: Personal Experience
PostPosted: Fri Nov 07, 2003 5:20 pm 
Beginner
Beginner

Joined: Wed Nov 05, 2003 4:38 pm
Posts: 29
Hibernate is not meant for this type of work, if you need speed the only guaranteed way of importing records the fastest is to use the gui or command line tools that come with the database application. If the file needs a bit of processing beforehand then you might have to read it in and write out another more database friendly file then import that file instead. Also if you don't want to do that then hand coded JDBC is probably definitely more efficient than Hibernate in this line of work.


Top
 Profile  
 
 Post subject:
PostPosted: Fri Nov 07, 2003 5:22 pm 
Hibernate Team
Hibernate Team

Joined: Tue Aug 26, 2003 12:50 pm
Posts: 5130
Location: Melbourne, Australia
The wait() call is probably the JDBC driver waiting for response from the database back end.

(1) Do you discard the session after each commit()?

(2) Are you *sure* that this is a problem best solved by writing Java code, instead of by using your database's import/export tools (which are MUCH faster than the JDBC driver)?


Top
 Profile  
 
 Post subject:
PostPosted: Fri Nov 07, 2003 8:39 pm 
Newbie

Joined: Tue Nov 04, 2003 12:41 pm
Posts: 6
Gavin,

(1) I do not discard the session after each commit, its the same session. However, I do not explicitly flush() it either, I figured the commit() flushes it.

(2) Long explanation follows:

I simplified the problem when I stated it. I am developing an order processor application for a company that receives over 100K orders each day from external trading partners in the form of batch files. The format of the data in the file is not known ahead of time. It is a '|' delimited file with variable sections of data (e.g. list items are specified as a number indicating # of list items followed by that many '|' delimited fields. Each list item may consist of 1 or more fields. The list items need to go in a subsidiary table of the order (one-many relationship). The orders are newline delimited. An order in the file has no unique identifier.
The file contains a header record with the format number in it. I have architected the application in a way that I defined the formats for the supported header format #s as XML documents. I generated the classes and the hibernate mapping for the supported formats by writing a code generator. When I process a file, I first retrieve the format # and get the format object, then create and populate the order objects for each record dynamically using reflection. Then I persist the objects and the sub-objects (lists of items within orders) using the Hibernate mapping for that class. Sometimes the orders with different format numbers go in the same set of tables. Am I making sense? Do you still think I should have used import?

Thank you!
Vaishali


Top
 Profile  
 
 Post subject:
PostPosted: Fri Nov 07, 2003 9:37 pm 
Hibernate Team
Hibernate Team

Joined: Tue Aug 26, 2003 12:50 pm
Posts: 5130
Location: Melbourne, Australia
Quote:
1) I do not discard the session after each commit, its the same session.


Bad! This will affect performance. You should either close() it or clear() it between transactions.


Oh, P.S. make sure you enable JDBC batch updates.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Nov 11, 2003 3:23 pm 
Newbie

Joined: Tue Nov 04, 2003 12:41 pm
Posts: 6
How do I clear() a session? That method is not supported by Session.
The evict() works for a single object only.

When I close() and open() sessions at each commit, the peformance becomes worse, it takes 5 times the time it took before. Establishing a new DB connection for each commit is very expensive.

I enabled JDBC batch updates and that made a very small improvement

Is there a way to bypass the caching of objects so I don't have to clear or close the session? Can I make Hibernate write through to the DB and not save any state for the objects? Thank you for all your suggestions.

Vaishali


Top
 Profile  
 
 Post subject:
PostPosted: Tue Nov 11, 2003 10:20 pm 
Hibernate Team
Hibernate Team

Joined: Tue Aug 26, 2003 12:50 pm
Posts: 5130
Location: Melbourne, Australia
Quote:
How do I clear() a session? That method is not supported by Session.


Hibernate 2.1

Quote:
Is there a way to bypass the caching of objects so I don't have to clear or close the session? Can I make Hibernate write through to the DB and not save any state for the objects?



No. This would expose you to stack overflows, etc, when dealing with object graphs with circularities.


Top
 Profile  
 
 Post subject:
PostPosted: Wed Nov 12, 2003 3:01 am 
CGLIB Developer
CGLIB Developer

Joined: Thu Aug 28, 2003 1:44 pm
Posts: 1217
Location: Vilnius, Lithuania
http://www.orafaq.com/faqloadr.htm


Top
 Profile  
 
 Post subject:
PostPosted: Fri Nov 14, 2003 3:40 pm 
Newbie

Joined: Tue Nov 04, 2003 12:41 pm
Posts: 6
Gavin,

I downloaded Hibernate 2.1 beta 6 and then called session.clear() following each commit of 50 inserts. It made no difference in performance. Am I missing something? I would really like this to work for me, I would hate to go back and change the design. My whole design is based on dynamic parsing of records of any format into objects and loading those objects into tables whose mapping is dynamically generated.

Baliukas,
Thanks for the link, but if you see my explanation of the file formats above, you will see why I chose a design using Hibernate over the Oracle loader.

Vaishali


Top
 Profile  
 
 Post subject:
PostPosted: Thu Dec 18, 2003 9:29 am 
Newbie

Joined: Fri Sep 12, 2003 1:44 pm
Posts: 15
Even when you enable the batch updates, your code needs to go to the database for every record to fetch the next value of the sequence to insert as the id.
So, just for test purposes, don't use a sequence generator, increase the batch size to about 500 records and you'll see a big performance increase.

For the sequence, you can do the following;
1) select the sequence next value with select for update and increase it by the number of records you'll insert later. This way you'll reserve those id's.
2) manually assign the id of the record to insert.

This way, you'll be able to take advantage of jdbc2 batch insert functionality.

Good luck,
Bulent Erdemir


Top
 Profile  
 
 Post subject:
PostPosted: Mon Jan 05, 2004 4:10 pm 
Newbie

Joined: Mon Nov 17, 2003 7:53 pm
Posts: 15
buler wrote:
For the sequence, you can do the following;
1) select the sequence next value with select for update and increase it by the number of records you'll insert later. This way you'll reserve those id's.
2) manually assign the id of the record to insert.


I would like to get the next sequence number and increment it by about 100 (basically reserving one hundred IDs). I'm not quite sure how to do this and I


Top
 Profile  
 
 Post subject:
PostPosted: Mon Jan 05, 2004 4:15 pm 
Hibernate Team
Hibernate Team

Joined: Tue Sep 09, 2003 2:10 pm
Posts: 3246
Location: Passau, Germany
Have you tried using seqhilo generator?


Top
 Profile  
 
 Post subject:
PostPosted: Mon Jan 05, 2004 4:25 pm 
Newbie

Joined: Mon Nov 17, 2003 7:53 pm
Posts: 15
I'm using Oracle therefor I asummed that there is no need for me to use it. I'm I correct in assuming this?


Top
 Profile  
 
 Post subject:
PostPosted: Mon Jan 05, 2004 4:29 pm 
Hibernate Team
Hibernate Team

Joined: Tue Sep 09, 2003 2:10 pm
Posts: 3246
Location: Passau, Germany
Well, you said:

Quote:
I would like to get the next sequence number and increment it by about 100 (basically reserving one hundred IDs). I'm not quite sure how to do this and I


Top
 Profile  
 
 Post subject:
PostPosted: Mon Jan 05, 2004 5:03 pm 
Newbie

Joined: Mon Nov 17, 2003 7:53 pm
Posts: 15
I've read up a bit on seqhilo and it seems that that the ids generated by it skip a range of ids (I assume it is because of how the generator is implemented). This is not desirable since we have a ton of data and it will only get bigger.

Thanks for the input gloeglm.

Anyone else have an idea of whether my original question is doable?


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 17 posts ]  Go to page 1, 2  Next

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.