Hi,
I have a web-based product that has been using Hibernate for many years. The product can be configured with either in-process Ehcache or external memcachd for Hibernate's second-level cache in a clustered environment. As a part of the functionality the product has the ability to import large amount of metadata from external sources and store them in the database. Periodically this information needs to be synchronized between the external sources and the database, and as such, large amount of writes (including adds/mods/deletes) can take place in the database. All interactions with the database pass through Hibernate layer, meaning that the code does not use JDBC layer directly, and both reads and writes go through the second-level cache.
One testing we had with a small cluster with two nodes revealed that the latency between app server (and Hibernate session factory in it) and the external memcached resulted from second-level cache interaction was rather significant. The exact same import/sync process took twice as long with memcached configuration as with Ehcache configuration due to the latency associated with interactions with the external memcached during that process.
For comparison purpose, I played with the cache mode associated with Hibernate sessions to alter its behavior in the interaction with second-level cache. I tried "normal", "ignore", "get", "put", and "refresh", and re-ran the same sync process. Among these, the "normal" (i.e., the default configuration) was still the fastest. All the other options ran slower due to its increased load on the database resulted from decreased or no utilization of the second-level cache.
At this point in time, rewriting the import/sync code directly on top of JDBC or rolling our own cache facility is not a viable option due to time constraint, and we would much prefer dealing with this issue via system configuration.
Are there any best practices for use cases like this? Should I ditch memcached and go with Ehcache for second-level cache? But that has its own share of problems. When our product is configured with 10 or more nodes, Ehcache-based configuration doesn't fly, due to the high network bandwidth they consume in keeping those 10 or more copies of caches in sync. Also since each cache is merely a replica, it doesn't add to the overall cache space. So we would prefer using memcached in a large deployment, but its base performance seems to suffer in exchange for scalability. Is there any best practices or tuning whereby I can achieve both performance AND scalability, instead of trading one for the other?
Thanks in advance for any insights. /Jong
|