-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 10 posts ] 
Author Message
 Post subject: Problem with import huge amount of data using hibernate
PostPosted: Wed Jun 18, 2008 8:20 am 
Newbie

Joined: Wed Jun 18, 2008 5:46 am
Posts: 7
Hibernate version:
3.2.6

Name and version of the database you are using:
PostgreSQL 8.3

Hi,

I've got a problem with import data from file to the database.
I'm using standard csv file but this file has over 1 000 000 rows.
One row represent one object so I wrote basic import function with this scenario:

Code:
-load csv file

-start loop
   -get row
   -create object and set data from row
   -save object using hibernate (I'm using HibernateCallback())
-end loop

-close file


Problem is in the performance. I need long long long time to execute this import. I was searching
for better solution how import a huge amount of data to the database using hibernate but nothing useful found.
I think it's common problem of the most applications. So if anybody knows it will really help me.

Thank you for every answer.

Best regards
Martin


Top
 Profile  
 
 Post subject:
PostPosted: Wed Jun 18, 2008 10:03 am 
Beginner
Beginner

Joined: Tue Dec 12, 2006 6:43 am
Posts: 32
Location: London
Martin,

May be Hibernate is not suitable for such a job (bulk data copy). Have you considered to use

bcp ( bulk copy from cvs into the database) I am not sure if this exist in PostgreSQL database ; I often use bcp in Sybase database from a Unix script such as Perl or Korn.




Regards

_________________
Alan Mehio
London
UK


Top
 Profile  
 
 Post subject:
PostPosted: Wed Jun 18, 2008 10:31 am 
Newbie

Joined: Wed Jun 18, 2008 5:46 am
Posts: 7
alanmehio wrote:
Martin,

May be Hibernate is not suitable for such a job (bulk data copy). Have you considered to use

bcp ( bulk copy from cvs into the database) I am not sure if this exist in PostgreSQL database ; I often use bcp in Sybase database from a Unix script such as Perl or Korn.




Regards



Hi Allan,

for the first thank you for your answer but this sollution is not acceptable for me. I need work with object before it's saved in to the database. When I create a new object and set data from row(file), I need set up some object's attributes and then save it.

Best regards
Martin


Top
 Profile  
 
 Post subject:
PostPosted: Wed Jun 18, 2008 11:11 am 
Beginner
Beginner

Joined: Sun Aug 22, 2004 11:00 am
Posts: 21
use the StatelessSession API if ur objects are just simple value objects without associations and collections.

make sure jdbc batching is on (this is described in the manual). try a batch size of 80 or so.

if ur using mysql make sure u turn server-side prepared statements on and use InnoDB. there are lots of DB parameters to tiinker with when bulk loading that can make a big time difference

use a machine with a fast cpu, lots of ram and a fast hard drive.

i have and app that uses hibernate to initially load over 100 million records and then later on perform normal crud operations on those objects. the initial load on a good machine takes about 7 hours.

_________________
Ive got a Tomcat that Struts then Springs then Hibernates


Top
 Profile  
 
 Post subject:
PostPosted: Wed Jun 18, 2008 11:55 am 
Newbie

Joined: Wed Jun 18, 2008 10:20 am
Posts: 1
bennini wrote:
i have and app that uses hibernate to initially load over 100 million records and then later on perform normal crud operations on those objects. the initial load on a good machine takes about 7 hours.


I think that 7 hours is a lot of time. Import uses a lot of CPU and memory and application will react slowly in this time. How do you solve this problem?

Suppose, that we need import such amount of data once per week, so we need import data such way that doesnt affect our application. Is it posible in general? Thanks for your reply

TOM


Top
 Profile  
 
 Post subject:
PostPosted: Wed Jun 18, 2008 5:49 pm 
Beginner
Beginner

Joined: Sun Aug 22, 2004 11:00 am
Posts: 21
7 hours for 100 million records is slow? i dont know man. that was with the old Session api. second level cache disabled, jdbc batch updates, the lot. tons of optimizations in the db config. i havent tried it yet with the new StatelessSession API which im struggling with (see my other thread).

i think direct jdbc (or hopefully this statelesssession api) may cut the time down a bit but theres simply no way to get around the fact that 100 million records is a 100 million records. and im talking about over 100 tables with some tables having multiple hundreds of columns.

i have no clue what ur doing exactly but im guessing u are bulk loading some 60+ gigabytes of data (thats about what my 100 million equates to) into a database each week? is that correct? if so you should invest in some serious hardware. for starters ur gonna see a HUGE benefit by going with 10000+ rpm hard drives. go SCSI while ur at it. next, if ur parsing that data (which im also doing....going from XML to relational database...only happens about once every 3-6 months though, with small intermittent changes happening multiple times daily) then i would also recommend a good cpu. upwards of 2.4ghz.

if you really wanna scream, partition your DB tables (im guessing ur loading into more than one table) onto different hard drives, then use a multi-core machine to divide and conquer the bulk load.

striped RAID arrays could also help although ive never tried it.

_________________
Ive got a Tomcat that Struts then Springs then Hibernates


Top
 Profile  
 
 Post subject:
PostPosted: Wed Jun 18, 2008 5:57 pm 
Beginner
Beginner

Joined: Sun Aug 22, 2004 11:00 am
Posts: 21
btw, just wondering.

how did Tompra's question to my response "help solve the problem"?

_________________
Ive got a Tomcat that Struts then Springs then Hibernates


Top
 Profile  
 
 Post subject:
PostPosted: Thu Jun 19, 2008 4:50 am 
Newbie

Joined: Wed Jun 18, 2008 5:46 am
Posts: 7
Hi bennini

thanks for reply. I need associations in my objects so StatelessSession API is unuseful but I tried batch processing with result 50 thousand rows in 6-7 minutes. It's not optional but as you wrote maybe it's a hardware problem. Anyway, is there anybody who has better idea how solve this problem?

Thank you.

Regards
Martin


Top
 Profile  
 
 Post subject:
PostPosted: Thu Jun 19, 2008 4:52 am 
Newbie

Joined: Wed Jun 18, 2008 5:46 am
Posts: 7
bennini wrote:
btw, just wondering.

how did Tompra's question to my response "help solve the problem"?


sorry, I'm new on this forum. I gave you credits too.


Top
 Profile  
 
 Post subject: Re: Problem with import huge amount of data using hibernate
PostPosted: Fri Sep 25, 2009 11:26 am 
Regular
Regular

Joined: Fri May 22, 2009 4:50 am
Posts: 59
Hi Bennini,

As you said its better use stateless session with hibernate when object is simple and without any collection.

But thats pretty useless to me. what are the faster options that one has when object contains collection maps?

Thanks


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 10 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.