I haven't gone far down the design path yet, but I'd like to just put this out there, because I'm wondering if (a) anyone is doing something similar and (b) what you guys think is the best way to handle this.
This is really about doing caching intelligently. A while back, I spent a long time experimenting with how NH handles caching / invalidation, because I never found much documentation about it. Meant to write it up and then didn't, so I'll note a few findings here (because maybe I got some of this wrong. I'm not an NH insider and I'm hoping someone on the inside can enlighten me) before actually asking my question.
General Lessons Learned
1. The hard thing about caching of any kind is usually
invalidating the cache -- making sure you do it often
enough, but
not unnecessarily.
2.
Entity Caching is fairly straightforward / simple.
Collection Caching is
not. (by "Collections" I mean both mapped Persistent Collections and HQL / Criteria queries)
NH caches entities separately from collections, meaning that cached collections are usually just a list of IDs, and NH uses them to create proxies that will eventually be initialized.
That means that when a collection is loaded from the cache, it will essentially call Session.Load() for
every item in the collection. This is very different from NH's approach to retrieving an uncached collection from the DB, which is to load both the collection and the associated entities at the same time. The risk here is that caching collections could actually
hurt performance if you don't think through it. There are a couple of inferences that should be made here:
- You should usually enable Entity Caching on any entities that are contained in cached collections.
- Batch-loading is very helpful when caching collections / queries.
3. Although people say that Queries are a viable replacement for Collections, Persistent
Collections and
Queries are apples/oranges.
Batch LoadingI haven't seen many people talk about batch-loading, which is a shame, because it's
great. It works like this: the NH session keeps track of every proxy it generates, and when one of them is initialized, it loads a batch of them at the same time, using a load query like "select * from Foo where id in (1,2,3,4,5,...)".
So if:
1. I've set the Foo class to have a batch size of 20
2. I've got a collection that will contain 100 Foo instances when loaded
Then:
1. Loading my collection from the DB will (depending on fetch strategy) probably issue an outer join query that loads 100 Foos in 1 SQL statement
2. If I load the collection from the cache and the Foos themselves are not in the cache (and that does happen), it will load 100 Foos using 5 SQL statements, whereas without batch loading, NH would issue 100 SQL statements, loading the entities one at a time.
FYI - I noticed a bug in NH a while back in that the NH Query Cache was not doing
any batch loading after retrieving cached query results. This is a huge deal, IMO. I logged a bug and a patch for it, but it appears to have stalled and not been addressed in the new NH 2.0GA release, so if you care about this issue, you may want to take a look at:
http://jira.nhibernate.org/browse/NH-1247IQuery / IFilter / ICriteria / ISQLQuery CachingLet's just call this "query" caching.
NH's support for query caching is great. It can cache paged queries (very important for us) and allows for you to specify custom cache regions.
NH is fairly sophisticated about
invalidating cached queries. It basically keeps track of what "spaces" the query got data from (i.e. the Foo space), and whenever changes are made in that space, all related query caches are invalidated. For example, if I have a query that loads Foo instances, its cache will be invalidated every time anyone saves or updates a Foo, meaning that there is a very low chance of me getting stale data, but it also means a lower probability of my cache being hit.
Unfortunately there are some drawbacks to query caching:
1. Cache invalidation happens very often, usually more often than desired
2. If I want to explicitly manage invalidating the cache myself, there appears to be no way to tell NH to NOT invalidate all my Foo queries every time I modify anything related to Foo.
3. There is no way to explicitly manage results, i.e. adding / removing items from a specific set of query results. All you can do is invalidate the cache and requery the DB. This is a fairly minor complaint most of the time, but it's important to understand when comparing to the benefits of persistent collections.
Persistent Collection CachingNHibernate's Persistent Collections (i.e. Bag, List, Set) are the most efficient way to cache a query and manage changes to it, because you can explicitly add/remove items to entity1.Collection without invalidating the cached entity2.Collection. This makes for some really effective caching.
The drawbacks of persistent collection caching are:
1.
Cached Persistent Collections (including Inverse ones) must be explicitly updated or they will get stale.
This isn't that big a drawback and is arguably a strength, but it's important to understand the implications.
Some people think they don't need to update the inverse side of a relationship and can often get away with it in a short-session app like a web app. But it just won't work if the collections are cached.
Similarly, NH persistent collections ignore changes in their "entity space". In other words, if myFoo.Bars is a cached, inverse collection and then create and save a new myBar = new Bar { Foo = myFoo }, it will
not result in my collection being updated. I have to explicitly call myFoo.Bars.Add(myBar) in order to update the collection. Usually this is not a big deal, but if I make a change that affects an unknown number Bar collections, I should probably just tell the SessionFactory to evict the entire cache for that collection definition.
2. Cached Persistent Collections cannot currently retrieve data in pages (by specifying startRow, maxResults). Yes, I can create an IFilter around a collection and get pages that way, but it basically turns into query caching at that point, rather than Persistent Collection caching. For that reason I say that true Persistent Collection Caching does not support paging data.
Wish ListIn a nutshell: I want / need a cached collection where I can:
- Get (cached) paged (or you could call it "batched) results AND
- Explicitly add/remove items directly to the cached collection without requiring a DB query to reload it AND
- Do a configurable "contains" query that would check cached result pages to see if the entity is there and if still not found, issue a DB query in the form of (1) a select that returns a single row in order to confirm containment or (2) a paged query that incrementally loads pages of remaining items until found.
Tell me if that sounds crazy. I think it's a legitimate need, and it sounds feasible to me. Also, most of those features exist in some form distributed throughout NH, but there's no place where I get all of them, unless I'm missing something (Am I??)
The big question is how best to implement it. I could either do it inside of NH or outside of it.
1. Outside of NH, I could just write a little CachingCollectionHelper that used IQuery, ICriteria, or IFilters to load paged datasets and IDs and then took care of the caching/hydrating myself, rather than rely on NH. Whenever updating inverse collections, I would need to explicitly add/remove stuff from my CachingCollectionHelper. Drawbacks would be that I'd have to explicitly manage all my helpers at the application layer, might have to write an IInterceptor in order to really honor the Session lifecycle with transactions / flushing etc. But it might be easier than the alternative:
2. Write something that runs inside of NH. The NH team seems to recommend not using Persistent Collections when performance is important (
http://www.hibernate.org/117.html#A10), but when it comes to intelligent caching I think there's more of an argument to have something that's plugged in to the whole NH ecosphere than just a Helper class that invokes a bunch of queries. Inside of NH, I could try to write my own CollectionPersister, which might be the right way to go, but sounds incredibly daunting. I'm wondering if anyone has ever done this / if there's a good sample / starting point for me to look at. Obviously it would be difficult (or impossible?) to provide mapping support for it, so I'd probably have a lot of custom classes extending it.
Thoughts anyone?