-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 5 posts ] 
Author Message
 Post subject: Batch processing application - appropriate for Hibernate 3?
PostPosted: Thu Jul 03, 2008 10:04 am 
Newbie

Joined: Thu Jul 03, 2008 9:48 am
Posts: 2
We are considering using Hibernate 3 for our batching processing application.

The application will process 18 million records from an Oracle 9 database. Some Hibernate objects will be nested 5 deep.

We've looked through the documentation and found nothing to cause concern. As long as the application flushes the first level cache regularly, JDBC batch is turned on, and there is an adequate number of DB connections we believe that the application will be OK.

Does anyone agree?

Any advice would be much appreciated.


Top
 Profile  
 
 Post subject:
PostPosted: Fri Jul 04, 2008 1:39 pm 
Expert
Expert

Joined: Tue May 13, 2008 3:42 pm
Posts: 919
Location: Toronto & Ajax Ontario www.hibernatemadeeasy.com
That's a pretty tall order.

You can certainly do it with Hibernate. With that type of data though, I might look at an ETL tool like DataStage or something. I mean, batch data processing is best done by tools designed for batch data processing. I'm not sure if an ORM tool like Hibernate should be the only technology to look at.

_________________
Cameron McKenzie - Author of "Hibernate Made Easy" and "What is WebSphere?"
http://www.TheBookOnHibernate.com Check out my 'easy to follow' Hibernate & JPA Tutorials


Top
 Profile  
 
 Post subject: re: hibernate and batch
PostPosted: Sat Jul 05, 2008 2:02 am 
Newbie

Joined: Fri Jul 04, 2008 4:38 pm
Posts: 2
This link provides some information on hibernate and batch: http://www.hibernate.org/hib_docs/refer ... batch.html

One thing to keep in mind is that with batch processing, ORM's as the persistence mechanism tend to not perform well. Finely tuned SQL is incredibly important for performance, which is why you tend to see JDBC, SQLJ, etc used. With ORM's, the SQL may not be as streamlined as possible because of extra column selects, simple SQL versus finely tuned SQL, etc. The following article goes into some discussions on high performance batch and covers some reasons why ORM technologies are not necessarily a good choice: http://java.sys-con.com/read/415321.htm

In terms of using a product like Datastage, it really depends on your requirements. If you're looking for traditional batch qualities of service like checkpoint and restart, or concurrent execution with OLTP workloads, then Datastage perhaps isn't the best choice. Datastage should be used for staging data :), pre/post processing data through ETL interactions, etc. On the other hand, if you're processing lots of isolated data - data where you have exclusive access and therefore fewer qualities of service like checkpointing/restart/OLTP-interleave/etc, Datastage might be the best choice.

Not to plug the technology I work a lot with, but what I've seen work well is WebSphere XD Compute Grid with a well designed data access layer, where we can leverage both ORM technologies (Hibernate/OpenJPA) and highly optimized SQL via SQLJ/etc. In this model, ORM technologies are used for some of the batch interactions, but for those that have very demanding performance requirements, we use SQLJ DAO's instead. In this model you get the checkpoint/restart and OLTP interleave provided by Compute Grid, along with the flexibility of using a simple persistence mechanism like Hibernate when you can, and deal with complex SQL queries when you must. A few customers are starting to use Pure Query technology from DB2 also, the benefit being the ability to quickly choose static or dynamic SQL queries via properties as opposed to the traditional method of compiling and binding the SQL, etc.

Some links on Compute Grid:

Compute Grid technology overview: http://www-128.ibm.com/developerworks/w ... ntani.html

Spring Batch versus Compute Grid: http://www-128.ibm.com/developerworks/f ... 4&tstart=0

Programming with Compute Grid: http://www.ibm.com/developerworks/websp ... gnola.html

Development tools for building batch applications: http://www-128.ibm.com/developerworks/f ... 4&tstart=0

Some best practices: http://www-128.ibm.com/developerworks/f ... 7&tstart=0

Sample eclipse workspace: http://www-128.ibm.com/developerworks/f ... 4&tstart=0


Thanks,
Snehal


Top
 Profile  
 
 Post subject: re: hibernate and batch
PostPosted: Sat Jul 05, 2008 2:09 am 
Newbie

Joined: Fri Jul 04, 2008 4:38 pm
Posts: 2
another good link that describes the experiences of hibernate and batch processing: http://www.jroller.com/wxlund/entry/hib ... _and_batch


Top
 Profile  
 
 Post subject:
PostPosted: Fri Jul 11, 2008 5:12 am 
Newbie

Joined: Thu Jul 03, 2008 9:48 am
Posts: 2
. . . many thanks for your input. Most helpful.

Michael


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 5 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.