-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 1 post ] 
Author Message
 Post subject: Hash Caching Value Equality Testing in Mutable Objects
PostPosted: Sat Dec 04, 2010 11:00 am 
Beginner
Beginner

Joined: Tue Nov 02, 2010 4:29 am
Posts: 21
We are writing a content management system that stores objects of class or sulclass Content.
Some of the applications actions has the potential to produce new content that has an identical VALUE to pre-existing content.
We want to catch this and flag the new content as a duplicate and deal with it accordingly.

Note that I am not saying identical object identity - this is always guaranteed to be unique.
I'm talking about value, e.g. if I were able to clone myself, my clone and I would have identical value if not identical identity.
I don't want to make the object Id a function of the object's values, e.g. I don't want my identity to change just because I cut my hair or buy new shoes - I remain myself!
The Id is currently a UUID which is set in the constructor prior to the first persistence (creation).

Checking a new piece of content to see if it is a duplicate has the potential to be a time consuming task, e.g. if I already have a thousand content objects and create one hundred more I don't want perform 100,000 field by field equality tests.

To this end I've added a valueHashCode field to the class along with a private setter and public getter. When the (immutable) object is first created (using a builder pattern) the valueHashCode is calculated and stored in the new object. This is then persisted with create actions and recovered on read. The value hash is simply a repeatable MD5 generated from the concatenation of the objects value getter methods (ignoring non embedded relationships and persistence fields, e.g. id, creationDate, etc). This allows a new object to be quickly checked if it is a duplicate using a readContentWithValueHashCode method in the ContentDao only if one already exists is a further every field check performed (just in case there is a collision - you can't be too careful).

This approach works fairly well - drastically improving the speed, but there are problems:

1. If the schema changes (along with data migration) during application start up then all the value hash codes have to be recalculated and recached - not a big problem to be honest.
2. I can't see how to make it work with mutable objects:

If the object is mutable you have to recalculate the value hash code every time a client calls a setter on the object. That's fine but the problem is that this will also occur when hibernate reads an instance out of the database - when it populates the object with data by calling each of its setters. This would make reading very very slow and lose the advantage of having the cached value. Would configuring hibernate to use the fields instead of the getters help with this? Would it set the fields directly without going through the setters? This would allow me to add the cache update functionality to the setters.

This raises another question. If I read a mutable object into memory, change one of its fields and thus it's value hash code I surely won't be able to use an HQL query to find it by it's hash code (as this will be only see the old persisted hash code?). I guess only by comparing against a read in object can I guarantee thread safety and that I am comparing against the most up to date instance of other objects. I'm then back to doing 100,000 reads which can't even be optimised by leaving the thousands of original objects in a session because of the memory overhead.

If I'm to give up on only doing this with immutable objects I would like advice on how to do this in a nice way for a class hierachy that must contain some mutable objects. Should I create an ImmutableContent interface that defines a getValueHash method? The weird thing is that it won't be able to specify a private setValueHash for use by the builder pattern.

Any ideas? Is there a standard pattern for this? I'm not currently using a second level cache (eg EHCache) is this my best solution?

Thanks,

Blank Reg.


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 1 post ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.