automatic "dirty" detection BUT no per-attribute c

eepstein · **Posted:** Wed Dec 31, 2003 5:30 am

I'm curious about the fact that Hibernate does automatic "dirty" (change) detection but does not support per-column concurrency (aka "optimistic locking"). I find that strange.

I find it VERY helpful to be able to specify columns to "lock" on. The idea of using a simple update counter (or, much worse, a timestamp) bugs me. Is this "fixed"? Or going to be?

Thanks,

Ezra E.

gavin · **Posted:** Wed Dec 31, 2003 6:21 am

Hibernate 2.1 supports this. But note that it doesn't (and can't) work with detached objects. Version numbers or timestamps are better anyway.

eepstein · **Posted:** Wed Dec 31, 2003 5:19 pm

When you say "can't" work with detached objects, you mean b/c of the way Hibernate is implemented, yes?

In theory there is no reason this can not work. Even with detached objects. All the information needed is in the object itself. What could be done is to hold on to the fetched value if/when an attribute changes. The updates are then very efficient:
UPDATE x SET foo='bar' WHERE foo='bas' OR foo='bar';
Assuming the record (object) fom x had a value of 'bas' for its foo column (associated attribute). Note the OR: if setting to the same value then you are not trampling someone elses (the whole point of the concurrency check) work so the update should succeed.

== Ezra E.

baliukas · **Posted:** Thu Jan 01, 2004 6:43 am

UPDATE x SET foo='bar' WHERE foo='bas' OR foo='bar';
Why do you think this update should succeed ? Is it new theorem in CC theory ?
As I understand update must fail on conflict.
The "foo" can be any value at the time detached object is updated.
It can be 'bas', 'bar', but it can be 'unknown' too, I see this nothing better than counter.
There is a special non-locking scheduler implementation known as TO (Timestamp Ordering) in concurency control theory. It is almost the same as optimistic locking in hibernate (like distributed scheduler). TO rule enforces serializability, I am not sure about hibernate implementation, but in CC theory it must be possible to implement strict TO scheduler too, to ensure recoverability.

eepstein · **Posted:** Fri Jan 02, 2004 7:44 am

It should succeed if the "foo" field (column/attribute) has not changed. That is the point of optimistic concurrency isn't it? My understanding of optimistic concurrency is that the default on conflict is the last updater (you) lose. So the whole point of the WHERE clause is: under what conditions should this fail and not what conditions does it succeed under.

There is never any guarantee in OC that an update will succeed. For that you need pessimistic locking (or true locking).

== Ezra E.

eepstein · **Posted:** Fri Jan 02, 2004 8:11 am

gavin wrote:

... Version numbers or timestamps are better anyway.

I'm not sure about the definition of "better" in this context. Here, however, is a brief overview of some of the options and their implications.

0. Is the same as #4 where the user has specified no columns. This is also the same as no concurrency control. I won't consider it further.

1 & 2. These can be seen as #4 with one specific column specified for locking and with possible system (either DB or middle-ware) support for incrementing the value or setting a new timestamp. Let's consider this.

A. Does it work with arbitrary existing schema?
NO. Existing schemas will generally need to be modified to include the special locking column: either an Integer or a Timestamp. (Of course there is a way of doing this with in-memory values only but then there is no way for other access methods of the DB -- e.g., any other systems using the DB -- to notify Hibernate of an update! See B.).

B. Is it a "good DB client citizen"?
NO. If any other access methods want to avoid having their changes trampled they must adopt the same mechanism. This is really bad. It means all other access methods become vulnerable to corruption (over-writing) becuase Hibernate has a limited implementation. It means if I want to use Hibernate against a database with existing access modes then I have to modify each and every one of those just to add code that, in effect, notifies Hibernate of the change! Ther reverse is not true: these systems, by doing proper per-column value checks will never over-write another update no matter where that update was made.

C. Does it provide fine-grained control?
NO. This is row-based concurrency, not column based. Even if 2 users modify different columns the second update will fail. Sometimes you want this, sometimes you do not. The user has no ability to specify this because the user can not specify lock columns. (Similarly, although on a smaller scale, with EJBs vs. Hibernate it again clear: granularity is better.)

I could keep going.

The thing I want to point out is that option #4 gives a "YES" answer to all of these. And option #3 provides all of this except for C. I can see punting on giving the user fine-grained control, but I can not see missing #3 as an option. Point B, above, makes Hibernate a non-starter in many existing systems!

Second, #4 is the root approach: all others are special cases of #4.

In terms of features it is hard for me to see how version counting or timestamping by themselves are better. The only feature they provide is a count of the number of updates or the time of the last update, respectively. Neither of which have anything to do with concurrency control and both of which are trivially and more accurately provided via one-line DB triggers. Indeed the history of version counting and timestamping goes back to a simple DBA trick as it is something that can be implemented directly in the DB itself: all that is required is the control column's (old) value be included on every update call. In JDBC (and ODBC for that matter) these tricks continue(d) and are basically a quick hack when there is not time (or need) for column-based checks. I don't think a service/framework can take this route.

So the only "better" I see is: easier for the developer of the framework in question -- not for its users.

I'd like to take up the issue of how to implement this -- it is a real issue. For now, however, this post is long enough.

== Ezra E.

emmanuel · **Posted:** Fri Jan 02, 2004 10:20 am

Does it perform quickly and lightly with minimal DB roundtrips ?
4. NO

max · **Posted:** Fri Jan 02, 2004 12:02 pm

eepstein wrote:

C. Does it provide fine-grained control?
NO. This is row-based concurrency, not column based. Even if 2 users modify different columns the second update will fail. Sometimes you want this, sometimes you do not. The user has no ability to specify this because the user can not specify lock columns. (Similarly, although on a smaller scale, with EJBs vs. Hibernate it again clear: granularity is better.)

If I understand your C correctly you want several options:

0. Allow the user to specify a subset of columns that will be used as part of the where when doing an update.

but to support the "fine-grained control" you would also need:

1. "column"-lock mode that can be specified on each update - not just globally per entity/class - right ?

Do you really use this generally for optimistic locking ? I have a hard time believing that this is a "very much used" optimistic locking scheme....

I follow the advantages - but geez there are also alot of "whoops - forgot to include the right columns" and "whoops - needs this unindexed column for proper locking" risici. I see the version field as a much easier to understand/use, safer, faster and way more used ;)

...but hey, you are more than welcome to add a jira for it and let people vote for it ....or even submit a patch for it....it all get's more listened to ;)

baliukas · **Posted:** Fri Jan 02, 2004 5:00 pm

eepstein wrote:

There is never any guarantee in OC that an update will succeed. For that you need pessimistic locking (or true locking).

== Ezra E.

"pessimistic locking" like TWO PHASE LOCKING can not guarantee it too (deadlocks).
As I know In theory there is no way to guarantee succeful updates in concurent transactions. Looks like multiversion concurency contorl with
TO and 2PL scheduler combination ensures the best concurency at this time (some operation can be "too late" some can couse deadlock, but no cascading aborts and read/write conflict).

eepstein · **Posted:** Fri Jan 02, 2004 5:25 pm

Actually that is not true. (And I was not posting for flames so am a bit surprised by the post.)

All of these approaches require the same number of DB round-trips. And all of these approaches require the same number of table/index scans in the DB. The difference is in the WHERE clause of the update. The where for #1 & #2 is fixed. For #3 it is: "all columns" and for #4 it is the locking columns.

There is no performance penalty.

== Ezra Epstein.

eepstein · **Posted:** Fri Jan 02, 2004 5:27 pm

eepstein wrote:

Actually that is not true. (And I was not posting for flames so am a bit surprised by the post.)

Was quoting:

epbernard wrote:

Does it perform quickly and lightly with minimal DB roundtrips ?
4. NO

baliukas · **Posted:** Fri Jan 02, 2004 5:46 pm

max wrote:

Do you really use this generally for optimistic locking ? I have a hard time believing that this is a "very much used" optimistic locking scheme....

It is "good" way. I agree, it is not "transparent" to add versions to model and schema, but I think the best way to handle concurency problems is
a schema design (It is not a very friendly way for "transparent persistence").
The proposed way to implement "Certifier" can cause performance problems on "large" tables. It is not a very good solution to index all fields, is it ?
But looks like it is almost the same as "optimistic locking" in hibernate,
I was sure hibernate uses "versions" for more "clever" things.

eepstein · **Posted:** Fri Jan 02, 2004 5:55 pm

max wrote:

If I understand your C correctly you want several options:

0. Allow the user to specify a subset of columns that will be used as part of the where when doing an update.

That is right. That is what I meant by "fine-grained" in this context. The granularity is saying: which columns do I check to ensure I'm not violating concurrency.

max wrote:

but to support the "fine-grained control" you would also need:

1. "column"-lock mode that can be specified on each update - not just globally per entity/class - right ?

I see less of a need for this, but more to the point that is a question of making concurrency altogether optional on some rows. Since that is not a per-entity decision it is not helped by the logic of entity (o/r) mapping. Instead frameworks (or "services") like Hibernate offer these features in the same way that Hibernate supports arbitrary PKs -- e.g., the method for checking if an ID is set or not so that either an Insert or Update can be performed as appropriate. That is per-instance, not per-class logic. The same would be needed here. You could tell hibernate that the concurrency logic is handled by the class (or some controller class) and give it a method to invoke. That method would distinguish between whether that row needs concurrency or not. Anyway, it is not what I mean by "fine-grained" in this case.

max wrote:

Do you really use this generally for optimistic locking ? I have a hard time believing that this is a "very much used" optimistic locking scheme....

It is essential. At least the #3 version of it: check all columns. There is no other way to ensure that a Hibernate enabled app isn't trampling an existing system's updates. That is bad. It means I can not rely on Hibernate to do updates in those scenarios (a common one!).

max wrote:

I follow the advantages - but geez there are also alot of "whoops - forgot to include the right columns" and "whoops - needs this unindexed column for proper locking" risici. I see the version field as a much easier to understand/use, safer, faster and way more used ;)

The first part of that is an entirely spurious argument. By that logic (i.e., developers are dumb) no one has the right to program their computer let alone do so in Java. For the second part, see A & B with its problems. (Version counting is ok when you have control over all access methods to the DB. Otherwise it is a quick-fix. An often-used one, perhaps, but can be dangerous.)

max wrote:

...but hey, you are more than welcome to add a jira for it and let people vote for it ....or even submit a patch for it....it all get's more listened to ;)

Now we get into implementation details. For this to work, Hibernate needs to keep information about the original state of an object when it was fetched (aka "snapshots") and have that available at commit time. (My guess is this is already being done for one column: the timestamp or update count column). If Hibernate is already storing fetch timestamps or update counts anyway then it could store more info.

There are different places this can be done: in a session on read, in an object instance itself on first not-from-db write to a field, or somewhere else (e.g., a hashmap keyed by object class+PK). The problem with in the session is this probably won't work with disconnected objects. The problem with doing it in each class is that either (a) hibernate's no-base-class-required scheme gets violated (not recommended) or (b) each class has to do the work itself with a copy on write for its attributes -- possibly with a little help from a built-in helper class. It could implement an interface to indicate that it has the info to support column-level concurrency. (c) The problem with this (global) approach is that it is per JVM so if 2 different users modify a single object -- they see the same values in their browser but update of one is after the other -- the global snapshot will agree with the DB and the second update will succeed even though it is perhaps over-writing values set via the very same app by a different user...

So the first question is: where to keep the original values. I do not know the architecture of hibernate well enough to know off-hand.

As for granularity that's less needed. It would be easy to add via a per-property attribute once the original values were available. Anyway, for my needs it would be acceptable if the "check all columns" approach were implemented. While coarse-grained, it is at least friendly to other (existing) clients of the DB.

== Ezra Epstein

eepstein · **Posted:** Fri Jan 02, 2004 6:02 pm

[quote="eepstein"]There is no performance penalty.[quote]

Thanks max, for pointing out that the "where" clause for non-indexed columns will require table scans if any column is not indexed (almost always the case). Whereas using a single column you can index on it and avoid a table scan on update by needing only an index scan (faster). I stand corrected. There is, however, no difference in DB round-trips (which was the quoted post's point).

Looking at this from an architecture stand-point for a moment (it does not pay to early optimize!) we see that version count and timestamp columns are a special case of being able to choose which columns are used for update checks. (In addition one needs either a one-line DB trigger or a trivial amount of logic for these to be fully automatic.) So looking for column-level "locking" is not an alternative to the existing but a generalization. Once you have column-level all the other approaches fall out gratis.

So, back to performance: now with column level locking IF you have *giant* table AND you do not care about concurrency with other applications that are accessing the DB and do not use an update count -- ie., no such apps exist in your case -- THEN go ahead and use a versoin count column -- it would be fully supported. (That's what "special case" means in this sense.)

== Ezra Epstein

eepstein · **Posted:** Fri Jan 02, 2004 6:07 pm

baliukas wrote:

max wrote:

Do you really use this generally for optimistic locking ? I have a hard time believing that this is a "very much used" optimistic locking scheme....

It is "good" way. I agree, it is not "transparent" to add versions to model and schema, but I think the best way to handle concurency problems is
a schema design (It is not a very friendly way for "transparent persistence").

Could you elaborate by what you mean by "handle concurency problems is a schema design"? Maybe give an example.

baliukas wrote:

The proposed way to implement "Certifier" can cause performance problems on "large" tables. It is not a very good solution to index all fields, is it ?

Thanks for that. (And sorry for attributing it to max -- too fast in the read/reply.)

== Ezra Epstein