-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 2 posts ] 
Author Message
 Post subject: Using identifiers with federated, legacy databases
PostPosted: Fri Jun 11, 2010 3:16 pm 
Beginner
Beginner

Joined: Fri Mar 26, 2010 11:55 am
Posts: 20
I'd like some advice on the best identifier strategy when federating multiple databases that have legacy data and applications.

Here's my use case:

Take two legacy databases. Each DB contains a table called 'users', each with pre-existing records. Each table has been using a DB sequence to generate surrogate keys. Since these databases have been existing without knowledge of each other, the same primary key can often be found in both tables.

Database 1, table "users":
[PK, Username]
(1, 'John')
(2, 'Mike')

Database 2, table "users":
[PK, Username]
(1, 'Roger')
(2, 'Richard')

The goal is to federate both tables into a single view of 'users', and use Hibernate to build applications on this federated set. Assume Hibernate will perform full CRUD operations on these tables. Also assume that legacy apps will continue to access these tables outside of Hibernate, and will perform full CRUD.

In order to create a federated view of both tables, we'll use MetaMatrix/Teiid -- a JBoss tool for data virtualization. MetaMatrix will sit between the two databases, and the Hibernate layer. MetaMatrix performs a UNION of the two tables, and provides single a JDBC interface for Hibernate.

The problem with the approach as described thus far is that the composite view does not contain unique keys anymore:
View "Users":
(1, 'John')
(1, 'Roger') <---non-unique key
(2, 'Mike')
(2, 'Richard') <---non-unique key

However, we can use the MetaMatrix layer to embellish the data to provide uniqueness. For example, we could append a source system identifier to each primary key, e.g.:
(DatabaseA-1, 'John')
(DatabaseB-1, 'Roger')
(DatabaseA-2, 'Mike')
(DatabaseB-2, 'Richard')

This approach isn't ideal because we've added business meaning to our primary key (namely, the source system identifier). But, it allows us to guarantee uniqueness, even when the actual PKs aren't globally unique. It also allows MetaMatrix to route updates/deletes to the appropriate DB (though there are other ways to achieve this without manipulating the primary key). As an alternate approach, we can define a composite key comprised of the original PK, plus a source system identifier, e.g. ([1, DatabaseA], 'John').

However, there's still a problem -- what generator should Hibernate use to create new IDs?

The options that seem worth consideration are:
1. UUIDHexGenerator
2. org.hibernate.id.enhanced.TableGenerator (hilo strategy)
3. "select" generator
(FYI, using the 'native' strategy would result in hilo, because MetaMatrix does not support sequences.)

I see potential issues with each approach:

UUID would require changes to the legacy databases to switch from the old, integer-based method to a new, UUID-based method. The PK would be converted from integer to byte array, and old values would need to be converted as well. The database sequence would need to be modified to generate UUIDs. And it could impact legacy apps that were previously expecting integer values back.

The TableGenerator hilo strategy has two problems: (a) the Hibernate-generated sequence could eventually collide with the underlying sequence used by legacy apps, and (b) the ID that Hibernate generates will not match the ID we get back from MetaMatrix during lookups, since MMX will append a source-system identifier to fetched records, in order to satisfy the uniqueness requirement, but the Hibernate-generated ID will not have this identifier value. (a) might be avoided by picking a sufficiently high starting point for the Hibernate-managed sequences. As a relatively new Hibernate user, I'm not sure if (b) actually causes a problem for Hibernate or not. Perhaps it could be solved by using a composite key of SK + SourceSystem, but in that case, I'm no longer sure if you can then use the standard Hibernate generators anymore, since you're working with a composite key.

The select generator requires the existence of a unique key outside of the PK, in order to work. In our case, that just doesn't exist (or, it exists as a composite key, but that's not supported).

Does anyone have any thoughts on the best approach, or see issues with my assumptions? I'd appreciate your advice, thanks.


Top
 Profile  
 
 Post subject: Re: Using identifiers with federated, legacy databases
PostPosted: Mon Jun 14, 2010 1:23 pm 
Beginner
Beginner

Joined: Fri Mar 26, 2010 11:55 am
Posts: 20
I'm beginning to think that the ideal approach would be to extend the 'select' strategy to support composite keys used to lookup the actual ID.

Has this ever been done before? If not, then is there an obvious extension point for doing it myself? I'll dig around but I'd appreciate it if anyone has some insight here.

Otherwise, UUID is probably best, if we're willing to make changes to the underlying database. Hilo is the option that requires the least amount of change, though it carries some risk of duplicate IDs in a worst-case scenario.


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 2 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.