steve wrote:
And as for increasing performance, how did you surmise that? My experience has been the exact opposite. Synthetic keys tend to be simple numerics like intergers which are very fast to index.
My first schema for an application was a completely normalized, all surrogate key design.
About 10% of my (select) queries are based around a self join of a large table with supporting joins to between 4 and 8 other tables, sometimes 8-16(if both ends of the self join needed the full supporting information). Normally half of these supporting joins were only needed to convert the surrogate key to a displayable value. 89% of the remaining queries are typically trivial single table queries on indexed columns. The left over 1% are even larger joins, but rarely needed.
Switching those relations that could support a reasonably strong argument for a natural key to use the natural key cut down over half the joins and reduced the query execution time by a factor of 8. The simple queries showed a very slight increase in execution time that, even under load, was typically lost in the network latency from the webserver to the user's browser. I didn't notice any measureable changes in the update/insert/deletes.
Now maybe two-thirds of the eliminated joins could probably be dealt with by the second level caching offered by Hibernate. It definately something with which I'll need to experiment -- these are small(<25 rows) to moderate (<200) relatively constant look-up tables from the perspective of most of the application.
My second generation schema only uses surrogate keys when no good natural key suggests itself -- and a composite key is almost never considered "good" unless I can't even dream about a case where I'ld want to refer to it.... and even then I'm cautious. The general schema is also very useful from a development point as ad-hoc queries tend to give much greater data as you don't need to join into the LUT.
Quote:
So why is denormalizing for read inherently a bad thing? It's interesting that you use the example of an address (or maybe that's what the book used). People have addresses; companies have addresses; etc. So most data modelers end up modeling this exactly as you described. But here's the thing. When is the last time you saw an application in which a person(s) and a company(s) shared an address pk value? I've seen one application in my 10+ years of programming where an ADDRESS table row could validly be referenced by more than one defined FK. So why reflexively build-in the performance overhead of normalizing this? For storage, right? That's a practical consideration, not a "relation modeling" consideration. Or perhaps you are concerned about having to add a UNIVERSE column to ADDRESS in a few years :)
(Yes that is the example the book used.)
No, no need for a UNIVERSE column, nor do I often take normalization to the extreme that would split city/state off to a seperate table referenced by zip_code ( assuming US only, for now).
However in my application, I commonly have multiple entities referencing the same address so that example jumped right out at me. I do agree with you that in general its not commonly a shared reference -- in which case I'ld probably produce a data model with person_address, compnay_address, etc, tables. Possibly "over-normalization", but it keeps the row size down which can be a win in some cases, typically in heavily used tables.
With the single exception of the zip-code functional dependency, every time I've denormalization, I've regretted it later. When I've found a place where it appears denormalization could help improve a poor preformance area, I've also found that something else was more to blame -- perhaps a bad algorithm, perhaps a poorly thought out query. In some instances it was better to produce a temporary table (or materialized view depnding on your DBMS) for some reporting process, especially with historic data.
But I'll have to remember that due to the languange in which I've done most of my database-backed (web) applications, I haven't had a very rich OO environment in which to work. Lessons learned might not apply... Part of the reason I'm looking at Java and Hibernate (and other parts of the Java alphabet soup) is to get a better OO model and if that means evolving my understanding of DB usage then I had better do that...
Thank you for your reply.