-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 6 posts ] 
Author Message
 Post subject: Loading a large volume of data.
PostPosted: Mon May 17, 2004 11:48 am 
Beginner
Beginner

Joined: Thu Mar 04, 2004 11:51 am
Posts: 34
I have the following requirement:
Load a large amount of data from a comma-delimited csv file.
I read the file, creating a POJO with a hibernate mapping, for each line.
These objects are complete except for ID which is not available in the csv.

For each object I look up a possible existing reference in the database by criteria other than id. If an object is found and has differences from the data in the csv file, then the csv data is updated into the found object and the found object is updated. If an object is found which does not differ from the csv data, nothing is done in the database. If no object is found, the id-less object in the POJO created from the CSV is saved., persisting it for the first time.

I want to do all this under a transaction. However, it's way too slow. What slows it down is caching. The more records have been processed, the larger the cache, and the more work hibernate must do when looking for existing records and the slower the processing proceeds.

When the table is empty and all data will be inserted, I can, therefore achieve a speed increase by evicting each item from the cache after it is persisted. But this fails when there is some preexisting data in the database. I get these errors:

6530 ERROR [main] hibernate.AssertionFailure: an assertion failure occured (this may indicate a bug in Hibernate, but is more likely due to unsafe use of the session)
net.sf.hibernate.AssertionFailure: possible nonthreadsafe access to session
at net.sf.hibernate.impl.SessionImpl.postInsert(SessionImpl.java:2350)
at net.sf.hibernate.impl.ScheduledInsertion.execute(ScheduledInsertion.java:30)
at net.sf.hibernate.impl.SessionImpl.executeAll(SessionImpl.java:2382)
at net.sf.hibernate.impl.SessionImpl.execute(SessionImpl.java:2335)
at net.sf.hibernate.impl.SessionImpl.autoFlushIfRequired(SessionImpl.java:1775)
at net.sf.hibernate.impl.SessionImpl.getQueries(SessionImpl.java:1536)
at net.sf.hibernate.impl.SessionImpl.find(SessionImpl.java:1501)
at net.sf.hibernate.impl.SessionImpl.find(SessionImpl.java:1491)
at com.fubar.persist.BaseDAO.retrieveObjs(BaseDAO.java:301)
at com.fubar.SalesDAO.getMatchingSalesByLocation(SalesDAO.java:206)

This stack trace is strange. Why is postInsert() called in the middle of a query? It appears that this is caused by autoFlushIfRequired. How can this be defeated? Or is there another way to defeat the performance-killing effects of caching?


Top
 Profile  
 
 Post subject:
PostPosted: Mon May 17, 2004 12:03 pm 
Hibernate Team
Hibernate Team

Joined: Tue Sep 09, 2003 2:10 pm
Posts: 3246
Location: Passau, Germany
Hibernate flushes the session before a query, to ensure consistent data. You can disable that with setFlushMode(NONE).

I can also manually clear the session cache by using clear/evict.

I have to say however what you are doing is really not what Hibernate is originally intended to do, mass inserts/updates is not what an OR-Mapper is designed to handle.


Top
 Profile  
 
 Post subject:
PostPosted: Mon May 17, 2004 12:30 pm 
Beginner
Beginner

Joined: Thu Mar 04, 2004 11:51 am
Posts: 34
Setting the flush mode to FlushMode.NEVER does remove the errors. But what I now discover is that cache eviction in the middle of a transaction seems to remove the update of that object from the transaction. So the update I thought was happening, actually isn't. Is my impression correct?
Is there a way to evict an object from the cache without evicting it from the transaction?

If you don't actually recommend Hibernate for tasks like these, what strategy do you recommend? Raw JDBC?


Top
 Profile  
 
 Post subject:
PostPosted: Mon May 17, 2004 12:31 pm 
Hibernate Team
Hibernate Team

Joined: Mon Aug 25, 2003 9:11 pm
Posts: 4592
Location: Switzerland
There is no way you can evict and still have an object transactional. Use straight JDBC for your task.

_________________
JAVA PERSISTENCE WITH HIBERNATE
http://jpwh.org
Get the book, training, and consulting for your Hibernate team.


Top
 Profile  
 
 Post subject:
PostPosted: Mon May 17, 2004 12:40 pm 
Beginner
Beginner

Joined: Thu Mar 04, 2004 11:51 am
Posts: 34
Rather than using straight JDBC, which would have taken longer, given my investment of time down this path already, I instead adopted the following triaging strategy which gives acceptable performance:

Iterate through my original collection of POJOs. For each, look up whether or not there is a preexisting reference in the database that needs to be changed. If so, change the object and move it into a new List called updates.
If no changes are required, forget about the object. If the object doesn't exist, add it to a new Collection called inserts. Then clear the session and proceed to update the updates and save the inserts.

However, in the future I will bear what you say in mind about such operations.


Top
 Profile  
 
 Post subject:
PostPosted: Mon May 17, 2004 1:00 pm 
CGLIB Developer
CGLIB Developer

Joined: Thu Aug 28, 2003 1:44 pm
Posts: 1217
Location: Vilnius, Lithuania
AWK + commandline client is a good way for imports, probably you do not need any programming if it is a plain CSV,


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 6 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.