delpouve wrote:
i think the problem may not be hibernate.
Well, the problem *is* the orm (not Hibernate specifically - others will suffer from the same kind of problem during large updates).
As described above in the thread, Hibernate has to keep an image of your objects in the current Session - which double the space required in memory.
It has to do so because it is its only way to detect changes you may have made to your persistent entities. When the session is flushed, Hibernate will compare the internal images (snapshots) with the current state of your entities. Changes will trigger updates/inserts/deletes.
That's actually why Gvan (or Christian) told you to clear() the session at some points in your batch - freeing some memory.
I think you could take the following approach during your batch:
1/ create new entity
2/ tell Hibernate to persist it (save()/saveOrUpdate())
3/ repeat until a certain amount of objects are created
4/ when you hit your treshold, flush() the session and then clear() it. Then restart at point 1/ for the remaining entities.
This should work just fine (I believe). Just be carreful if you keep reference after the clear() to entities that were loaded by the session: they will not be associated with it, so changes made to them will not be detected anymore. Unless you explicitly re-associate them with the cleared session by calling saveOrUpdate().
Another hint:
By default, Hibernate flushes the session automatically, when it believes it is required. It does it also before executing (some) queries: it is required to persist all changes still in memory so the query (executed against the database) will return accurate results.
This behavior is fine and is actually one of the features that make Hibernate transparent and so powerful...
Unfortunately, the flush process may consume a fairly large amount of time. Remember that during the flush, Hibernate has to compare its snapshots with the current state of your entities associated with the session - the time required for this process is directly proportinal to the total number of properties to compare.
Think about the following scenario:
- you have 10.000 entities in the session;
- each entity has 10 properties;
- this gives a total of 100.000 properties to compare
Here is your hypothetical process:
1. create a new entity and add it to the session - nothing special happens;
2. before creating the next one, you have to lookup some references in the database. For this, you have to issue 3 HQL queries:
2.1 first query - before execution, Hibernate needs to flush the session - 1st flush - the changes made in 1/ will be transfered to the db;
2.2 second query - second flush - for 'nothing';
2.3 third query - third flush - for 'nothing'
As you can see, the two latest flushes were not required, because you haven't made any changes since 1. Unfortunately, Hibernate is not away of this - so it has no other alternatives than doing these flushes...
When doing large updates as you are doing, these extra flushes may have a great impact on performance - I'm pretty sure you observe this same behavior in your batch: the creation rate drops down as the number of objects increases.
Hopefully, Hibernate provides a solution - as usual...
If the queries you make during the creation of an object do not depend on changes made during the creation of previous ones, then you can turn-off the auto-flush (session.setFlushMode(FlushMode.NEVER)). This way, Hibernate will not flush the session anymore until explicilty instructed to do so. But remember that queries against the DB will not take your latest changes into account (they are not flushed!).
At the end of the process, flush() the session - everything will be sent to the database - and eventually re-enable the auto-flush.
This strategy give me 2000% performance boost (20 times faster!) with some large batches...