I'm working on a project having some kind of huge domain model. By that I mean that the business treatment is usually on one or two objects but uses a lot of what we call reference data which is all the associated objects and their configuration.
For example, we handle mainly trades which are associated to financial instrument, trading account, stock exchange and so on.
To have good performance we use two versions of the reference data. One for administration (creating / modifying them) and one for associations with the real business objects processed by the system.
This second version is considered read-only (they ain't modified really often). So they are mapped mutable="false" + cache="read-only" (we are using level 2 caching extensively).
Our problem is that there are many of them, which makes the flush pretty slow even if they are all immutable. And so we are currently looking for solutions. We have problems not only for batches (clearing the session once in a while during the batch solved a huge part of the problem) but also for real-time processing.
So the first thing we did was to remove the auto flush functionnality (FlushMode.NONE). We now call flush explicitly when we know the update / insert might be used by a "select" later or just before a "select" if this "select" need synchronisation with the DB. Whether it's placed with the update / insert or select depends which one is called the most. That got us around 40% improvement.
But it still too slow. So I did some debugging in the Hibernate code and notice that all the pain was in the flushEverything() method. So I went about thinking on how it can be improved and got these results that I'm now submitting to your evaluation.
BTW, we are currently using hibernate 2.1.4 (but I checked Hibernate2 HEAD file and nothing changed so all my comments are still valid I guess).
First, autoFlushIfRequired() is wiser than flush() since it's flushing only if needed for a given query. So I guess I should do something like:
Code:
session.setFlushMode(FlushMode.AUTO);
try {
session.find("from stuff_that_might_need_flush_first");
} finally {
session.setFlushMode(FlushMode.NONE);
}
instead of
Code:
session.flush()
session.find("from stuff_that_might_need_flush_first");
(I just thought of that in my shower this morning, will be called the "flipping mode trick" in the rest of the document)
But then, some deeper thoughts.
1- isDirty()
isDirty() could have a quick exit. Currently it always performs a complete flushEverything(), it could exit as soon as a dirty field is found. However, it won't allow us to do things like:
Code:
if(session.isDirty()) {
session.flush();
}
because:
- If the session is clean, isDirty will have to perform a complete flushEverything()
- If the session is dirty, a partial is dirty will be done followed by a complete one in the flush()
So anyways, you have (at least) a complete flushEverything() performed. The "flipping mode trick" still is the best solution).
2- flushEverything()
The issue with flushEverything is when entities' attributes are checked for their dirtyness.
The problem is that entity's attributes are checked
BEFORE checking if the entity is mutable. I suggest to move the isMutable check way above.
3- autoFlushIfRequired()
The problem is that it first calls flushEverything() and then checks if tables are to be updated ( areTablesToBeUpdated() ). Considering that it's the dirty values checks that are costly, I think we should turn this algorithm upside down by only checking the dirtyness of entities that are in the query space. That should optimize
a lot the processing speed I think.
4- areTablesToBeUpdated(Set tables)
Maybe the solution is only to extend Session API. For example, having an areTablesToBeUpdated public method allowing the developer to provide the tables that he knows need to be flushed. This method will then check the dirtyness of all entities mapped on these tables. This is just an example. An I think implementing the algorithm described in 3 would remove this need.
----------------------------------------------------
I think I'm through. I hope some if there ideas are good / feasible. I know they represent a lot of code change. I also know that there might be some complexities due to collections.
But anyway, tell me what you think about it.
Cheers,
Henri
P.S.: Just a final question. Does Hibernate 3 still need to check every attributes for their dirtyness? Because I know every mapped class can be overloaded to allow lazy attributes. So I was wondering if there was also some behavior added to the setters to flag the dirty entities. That would be great I think.