baliukas wrote:
It can be nice to have all features in single framework and update without load is very usefull, but it is very hard to understand this way to implement batch processing. As I understand performance is very important for you, so why do you want to do it on client ? Are you sure this problem exists ?
I am not sure what you mean by "update without load"? We want to load without caching because we don't plan on updating the loaded records. Batch processing often involves pushing data through a series of steps (a data pipeline), which can mean reading lots of data that is not going to be updated.
I understand where the Hib team is coming from in believing that batch processing usually doesn't make sense in Java: why materialize data into the JVM, incurring network overhead and marshalling/demarshalling costs? This applies to any database application; it is not specific to Hibernate.
Whenever possible we try to formulate the problem so that it can be expressed in SQL, which is by far the fastest way to accomplish batch processing. However, not every problem can be reasonably expressed in SQL. In some situations we rely upon PL/SQL, but as I explained in email to Gavin, we still have good reasons to resort to Java batch processing in a few situations:
- We need to cleanse large batches of company names and addresses. Cleansing is accomplished with 3rd party cleansing engines, which are in some cases Java libs and other places C libs that provide JNDI apis. We could conceivably write Java stored procedures in Oracle to do this, but we are very leery of that approach (we ultimately want to run on DB2 and SqlServer, so we're trying to stay as vendor-neutral as possible). So, instead, we materialize records from the database into the JVM and cleanse from there.
- Similarly, we need to match records to one another. Again, we rely upon 3rd party matching engines. While the vendors all have different approaches, the problem again requires us to pass records into Java and/or C libraries. The algorithms that direct which records should be compared and in what order can be quite complex, which is all the more reason not to push this into (Java) stored procs in the database.