I have the following requirement:
Load a large amount of data from a comma-delimited csv file.
I read the file, creating a POJO with a hibernate mapping, for each line.
These objects are complete except for ID which is not available in the csv.
For each object I look up a possible existing reference in the database by criteria other than id. If an object is found and has differences from the data in the csv file, then the csv data is updated into the found object and the found object is updated. If an object is found which does not differ from the csv data, nothing is done in the database. If no object is found, the id-less object in the POJO created from the CSV is saved., persisting it for the first time.
I want to do all this under a transaction. However, it's way too slow. What slows it down is caching. The more records have been processed, the larger the cache, and the more work hibernate must do when looking for existing records and the slower the processing proceeds.
When the table is empty and all data will be inserted, I can, therefore achieve a speed increase by evicting each item from the cache after it is persisted. But this fails when there is some preexisting data in the database. I get these errors:
6530 ERROR [main] hibernate.AssertionFailure: an assertion failure occured (this may indicate a bug in Hibernate, but is more likely due to unsafe use of the session)
net.sf.hibernate.AssertionFailure: possible nonthreadsafe access to session
at net.sf.hibernate.impl.SessionImpl.postInsert(SessionImpl.java:2350)
at net.sf.hibernate.impl.ScheduledInsertion.execute(ScheduledInsertion.java:30)
at net.sf.hibernate.impl.SessionImpl.executeAll(SessionImpl.java:2382)
at net.sf.hibernate.impl.SessionImpl.execute(SessionImpl.java:2335)
at net.sf.hibernate.impl.SessionImpl.autoFlushIfRequired(SessionImpl.java:1775)
at net.sf.hibernate.impl.SessionImpl.getQueries(SessionImpl.java:1536)
at net.sf.hibernate.impl.SessionImpl.find(SessionImpl.java:1501)
at net.sf.hibernate.impl.SessionImpl.find(SessionImpl.java:1491)
at com.fubar.persist.BaseDAO.retrieveObjs(BaseDAO.java:301)
at com.fubar.SalesDAO.getMatchingSalesByLocation(SalesDAO.java:206)
This stack trace is strange. Why is postInsert() called in the middle of a query? It appears that this is caused by autoFlushIfRequired. How can this be defeated? Or is there another way to defeat the performance-killing effects of caching?
|