I'd like some advice on the best identifier strategy when federating multiple databases that have legacy data and applications.
Here's my use case:
Take two legacy databases. Each DB contains a table called 'users', each with pre-existing records. Each table has been using a DB sequence to generate surrogate keys. Since these databases have been existing without knowledge of each other, the same primary key can often be found in both tables.
Database 1, table "users": [PK, Username] (1, 'John') (2, 'Mike')
Database 2, table "users": [PK, Username] (1, 'Roger') (2, 'Richard')
The goal is to federate both tables into a single view of 'users', and use Hibernate to build applications on this federated set. Assume Hibernate will perform full CRUD operations on these tables. Also assume that legacy apps will continue to access these tables outside of Hibernate, and will perform full CRUD.
In order to create a federated view of both tables, we'll use MetaMatrix/Teiid -- a JBoss tool for data virtualization. MetaMatrix will sit between the two databases, and the Hibernate layer. MetaMatrix performs a UNION of the two tables, and provides single a JDBC interface for Hibernate.
The problem with the approach as described thus far is that the composite view does not contain unique keys anymore: View "Users": (1, 'John') (1, 'Roger') <---non-unique key (2, 'Mike') (2, 'Richard') <---non-unique key
However, we can use the MetaMatrix layer to embellish the data to provide uniqueness. For example, we could append a source system identifier to each primary key, e.g.: (DatabaseA-1, 'John') (DatabaseB-1, 'Roger') (DatabaseA-2, 'Mike') (DatabaseB-2, 'Richard')
This approach isn't ideal because we've added business meaning to our primary key (namely, the source system identifier). But, it allows us to guarantee uniqueness, even when the actual PKs aren't globally unique. It also allows MetaMatrix to route updates/deletes to the appropriate DB (though there are other ways to achieve this without manipulating the primary key). As an alternate approach, we can define a composite key comprised of the original PK, plus a source system identifier, e.g. ([1, DatabaseA], 'John').
However, there's still a problem -- what generator should Hibernate use to create new IDs?
The options that seem worth consideration are: 1. UUIDHexGenerator 2. org.hibernate.id.enhanced.TableGenerator (hilo strategy) 3. "select" generator (FYI, using the 'native' strategy would result in hilo, because MetaMatrix does not support sequences.)
I see potential issues with each approach:
UUID would require changes to the legacy databases to switch from the old, integer-based method to a new, UUID-based method. The PK would be converted from integer to byte array, and old values would need to be converted as well. The database sequence would need to be modified to generate UUIDs. And it could impact legacy apps that were previously expecting integer values back.
The TableGenerator hilo strategy has two problems: (a) the Hibernate-generated sequence could eventually collide with the underlying sequence used by legacy apps, and (b) the ID that Hibernate generates will not match the ID we get back from MetaMatrix during lookups, since MMX will append a source-system identifier to fetched records, in order to satisfy the uniqueness requirement, but the Hibernate-generated ID will not have this identifier value. (a) might be avoided by picking a sufficiently high starting point for the Hibernate-managed sequences. As a relatively new Hibernate user, I'm not sure if (b) actually causes a problem for Hibernate or not. Perhaps it could be solved by using a composite key of SK + SourceSystem, but in that case, I'm no longer sure if you can then use the standard Hibernate generators anymore, since you're working with a composite key.
The select generator requires the existence of a unique key outside of the PK, in order to work. In our case, that just doesn't exist (or, it exists as a composite key, but that's not supported).
Does anyone have any thoughts on the best approach, or see issues with my assumptions? I'd appreciate your advice, thanks.
|