-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 6 posts ] 
Author Message
 Post subject: Do I really need Hibernate Shards?
PostPosted: Mon Mar 26, 2007 6:35 pm 
Beginner
Beginner

Joined: Fri Apr 22, 2005 5:58 pm
Posts: 26
I have recently gone through the exercise of re-architecting our platform for horizontal scalability. The exercise resulted in the decision to use a sharded database model with our accounts being spread out over N shards.

This was all fine and good, but then I thought about how to implement this from an application POV. What was decided was that we would use a "Global Catalog" database which would basically store configuration information about an account. This would include information regarding what shard the data for that client should be read from or written to.

All requests through our system will first query the GC (Global Catalog), and that initial query will then tell the system what shard to use for the remainder of the transaction. There will never be a transaction which will need to access data from two seperate shards, nor will we ever need to "search" shards to figure out where data belongs.

I have looked through the hibernate shards docs, but I don't think I am going to need them... but I just wanted a sanity check on this.

All I really need to do is create a SessionFactory for each physical shard. Being that my application will know what shard it should be using I can simply store the factories in a Map. The only thing I will need to do differently during the initialization of the SessionFactory is change the "connection.url" property to point to the correct MySQL server. As long as my application layer retrieves the correct SessionFactory, the remainder of my platform should remain unchanged.

The other change is that each account will have it's own MySQL schema (or namespace). This means we are going from one gigantic database with all clients and data in shared tables to multiple physical databases with client specific schemas. This means all hibernate queries would need to pass the fully qualified [schema].[table name] for the queries... but I think I can get around this by modifying my HibernateUtil.getSession() method to always execute a "USE [schema name];" prior to returning the Session object. This would basically tell the MySQL server to use that schema for that physical connection until further notice... right?

Does anyone see anything blatantly wrong with this plan? Is there anything internal to Hibernate that I might be missing using this methodology?

Thanks in advance (and sorry for the long post)
Geoff


Top
 Profile  
 
 Post subject:
PostPosted: Thu Mar 29, 2007 3:23 am 
Contributor
Contributor

Joined: Fri Feb 11, 2005 8:15 pm
Posts: 31
Hi Geoff,

Sorry for not responding sooner, I've only been monitoring the Shards forums.

I'm of course in no position to do a full-fledged design review of your proposal, but if each request to your application only needs to talk to a single shard and you can determine which shard that is up front I agree, you probably don't need Hibernate Shards. However, if you can envision functionality in your system that will require access to multiple shards you might still consider using it. Maybe you're sharding your accounts by location (North America, Europe, etc.) but someone wants a listing of all accounts that were opened in 2002. The answer to that query lives on multiple shards, and Hibernate Shards can help with that.

The notion of "locking" the shard for the remainder of the request is interesting. If you're brave enough to look at the code (ShardedSessionImpl) you'll see we already have the ability to lock the shard for the purpose of shard selection (it's an undocumented feature). We should definitely extend this to allow you to lock the shard for the purpose of querying as well.

Max


Top
 Profile  
 
 Post subject:
PostPosted: Thu Apr 19, 2007 1:47 pm 
Newbie

Joined: Fri Apr 13, 2007 12:25 pm
Posts: 17
I have also been looking into partitioning my application using shards, but have come to the same realization that Geoff has. My question is about the performance implications of storing factories in a map. How many SessionFactory objects can i reasonably expect to keep mapped per application server? 10? 100? How is this mapping strategy working for you Geoff?


Top
 Profile  
 
 Post subject:
PostPosted: Thu Apr 19, 2007 2:31 pm 
Contributor
Contributor

Joined: Fri Feb 11, 2005 8:15 pm
Posts: 31
Hi oregontarheel,

I confess I don't entirely understand your concern. What is the "map" to which you are referring?

It's up to you to decide how many SessionFactories you need (you'll have 1 SessionFactory per physical shard), but whether you have 10 or 100, I don't think a map lookup is going to be your primary concern. If I'm totally off-topic I apologize, I'm pretty sure I don't understand your problem.

Max


Top
 Profile  
 
 Post subject:
PostPosted: Thu Apr 19, 2007 3:57 pm 
Newbie

Joined: Fri Apr 13, 2007 12:25 pm
Posts: 17
I apologize for my post being confusing. I was talking about the alternative to using shards, which I have set up as a test right now. Since each request in my application will be using one and only one database, with identical schema on each database (one per corporate account), there's really no need for me to use shards, from what I gathered. Heres an example of what I'm currently sandboxing (this is an excerpt from my HibernateUtil class):

Code:
    public static SessionFactory getSessionFactory(Account a){
       if(!sessionFactoryMap.containsKey(a.getId())){
          String account_db = a.getDatabase();
          configuration.setProperty("hibernate.connection.url", "jdbc:mysql://" + DB_HOST + "/" +account_db);
          sessionFactoryMap.put(a.getId(), configuration.buildSessionFactory());
       }
       return sessionFactoryMap.get(a.getId());
    }


(sessionFactoryMap is an object of a Map subclass, right now I am using a HashMap)

My question is, how expensive are these SessionFactory objects, and at what point am I going to max out of memory? A ballpark figure here would be great... Obviously if I go down this road much further some logic will be needed to expire these factories after a certain period of inactivity.


Top
 Profile  
 
 Post subject:
PostPosted: Fri Apr 20, 2007 11:44 am 
Contributor
Contributor

Joined: Fri Feb 11, 2005 8:15 pm
Posts: 31
I get it I get it.

I can't tell you how many session factories you'll be able to allocate because I've never built an app with a large enough number of them to worry about this. I will say, however, that one shard per customer might be difficult to maintain (and by "maintain" I'm just talking about the regular TLC that production databases typically require) unless you expect to have a relatively small number of customers. Hopefully that's your expectation.

Max


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 6 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.