Adding language support

Martin Kersten · **Joined:** Tue Mar 16, 2004 5:15 am **Posts:** 33

Dear Hibernate fellows,

I am standing infront of a stack of thoughts and I would like to get some oppinions/advises on that.

The task:

Dear Mister Developer (callsign 007), please add language support to the application. Do it the Happy Happy, Joy Joy way (make everyone happy except yourself) and do it with style. The message will selfdestruct in about notime.

The thoughts:

Doing some serious thinking (remember I am an secret agent), I came up with this:

- Adding an object called Language, representing a language.
- Replace any string with a language enabled version if necessary.

Having a Product POJO it would look like this:

Single language support:
Product.getName(),
Product.getDescription()

Multiple language support:
Product.getDescription(Language) : ProductDescription
ProductDescription.getName()
ProductDescription.getDescription() <-- ?

Scenario of use:

There will be lots of diffrent servers for diffrent namespaces. So it is needed that by issusing Product.getDescription(Language) only the needed language version of the strings are retrieved from database (or reside cached within the memory). So I don't want it to iterate over the set of product descriptions in all languages. Also I would like to avoid issuing a query on retrieving a single product name (would be crazy).

Does anyone have some oppinion on that issue or something of value to add? How would you solve the task?

Thanks,

Martin (Kersten)

gavin · **Posted:** Sat Jul 31, 2004 8:33 am

http://blog.hibernate.org/cgi-bin/blosx ... 06/23#i18n

Martin Kersten · **Joined:** Tue Mar 16, 2004 5:15 am **Posts:** 33

Thanks garvin. I have read this solution. but sadly it does not apply to my problem after all. (but it made it into my bookmarks, since it solves a problem I might face later on)

There are use cases (transactions) requiring names in more then one language being available. So the used language needs to be a visible fact.
Imagen adding a new product and editors need to translate it. You will have something like:

Product.getName(language1) -> send to client
receive response -> Product.setName(language2)

Since it is unlikely that every supported language is used by a specific server (co-located / clustered) but some would at least use a subset of the available languages. For example the germans would use english and german (translation usecase).

So currently I am up to:

1. Provide Product.get/setXXX(Language [,newValue]) and clearly state out that there are more then one names, each in a diffrent language.
2. Use the ProductDescription as an implementation detail and never make it available to the user (business layer).
3. Use lazy loading to avoid unnecessary informations being cached or loaded. (avoid unnecessary polution of memory).

I consulted the hibernate documentation on that issue again and found this statement:

"filter() or createFilter() are also used to efficiently retrieve subsets of a collection without needing to initialize the whole collection". (Chapter 6 - Section 5, page 50 in the middle).

The question is how this will work in detail. I was never forced to filter a collaction at a similar solution. It all came done to a query before.

So I guess it was ment to implement ProductDescription Product.getDescription(Language) this way:

1. Use a one-to-many relation.
2. Specify it as lazy loaded collection.
3. Access this collection by filter it using the languages needed (for example the default language and the current one - default scenario).

The question is, wether this is a good way to implement it. I am also unsure wether to use one big table for all language strings or to stick for a couple of domain specific tables.

Are filters smart enough to filter partly cached collections? I mean filtering a collection partly known (since a former similar filtering was performed before). Is it smart enough to recognize that it is about to filter on the primary key constraints, meaning looking for matches to primaryKey(id,language)? You know I would like to avoid queries and unnecessary caching of objects at all costs.

Thanks and sorry to disturb you during the weekend,

Martin (Kersten)

Martin Kersten · **Joined:** Tue Mar 16, 2004 5:15 am **Posts:** 33

Well the solution was not simple but after I read the very good 'Hibernate in Action' book I know what to do.

My thanks to both of you Christian and Garvin, great piece of work! Enjoyed it much but it was fairly small. I only needed around 5 hours to read it or better to eat it ;-).

christian · **Posted:** Fri Aug 06, 2004 5:25 am

5 hours? My final proofread took me about 25 hours and I read as fast as I could without skipping anything.

Martin Kersten · **Joined:** Tue Mar 16, 2004 5:15 am **Posts:** 33

Quote:

5 hours? My final proofread took me about 25 hours and I read as fast as I could without skipping anything.

As I wrote I read it, you proof-read it. Quite a big diffrence ;-). Since I was not a hotshot about Hibernate, it was more like checking the index and check for words I don't know, puting down questions and start reading using some speedreading technics. Mostly I checked for words I don't know, checked the first and last sentence of a paragraph and the prictures / code. So I could just dip into the text in detail if needed (I liked the theory informations very much). So 5 hours is quite not much. But there where some areas where I would like some more pages about. But I can't really remember... .

Martin Kersten · **Joined:** Tue Mar 16, 2004 5:15 am **Posts:** 33

Since I got an email asking what all my thinking turned out to, I want to give everyone some information about what I have done.

At the end everything turned out to this picture:

Product { {language, name, description} .... some more}
Content {{language, title, content} ... some more}

This results in the FDs and constructing the hull and the generic system looks like this:

Product -> PK, langauge, name, description,
Product -> PK, price, cost (price+cost are not language depending)
Content -> PK, language, title, content, x, y
Content -> PK, x, y

So it ends with two tables each for product and content. Well at the end it turned out that both tables look quite similar. So it can be unified by adding a type information:

Product -> PK, price, cost
Content -> PK, x, y
Document -> PK, language, type, title, content

So a product name is the document title associated to a product with the same type and thats it. So each product has a list of its associated language related content. This was merely the big problem and the center of my consideration. Having to access a list to get the appropriated product.

But this isn't a big problem. Most of the modification use-cases center only on a product or its language related names/description (not likely both). So this is not a big problem. The web pages are generated by using a middle tier having its own cache and which is synchronized with the database servers on a timely manor or a message channel (mission critical). So this is not much a problem either. The middle tier (for caching) was introduced to be able using a Lucene based searching solution. But it was blown up to cache some more (not much more). Itself uses plain HSQL/SQL constructs to read more then one entry at once.

The idea behind this is not to use something sophisticated like a distributed cache or something alike. Its more like using a distributed database (replication mostly) and use the database and the associated manager (repository+factory aspects) as a synchronization layer.

Since all is archived by using a wiki style system (who did what change and when etc.) the Document is actually a DocumentVersion and here things got even trickier. Imagin adding a version to all of this stuff. What a waste and all those sql MAX calls or having the need to add current_version informations to the product just to keep track with the latest language id (dude you need a version per language and product).

At the end I remembered the WikiMedia solution (check it out worth a read) and simply uncoupled the version history and the current state. So there is now an additional table called document_history.

So that is really all to say about. Ah well wait, something most interesting to all of you: this system is currently not in production use, but was load tested in its alpha state. And it was not suffering by the language depending stuff, compared to another not language aware solution, which is currently pushed towards finalized and going through a beta test state but two compontents are still alpha (automatic information synchronization with 3rd party external information sources). The language aware solution will trail it once it is final. There isn't much to change. The single language solution only has the document version solution... .

So I guess you understand the solution. Any comments welcome... .

Cheers,

Martin (Kersten)