-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 29 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: Bypassing the cache to improve performance
PostPosted: Tue Aug 31, 2004 3:38 am 
Regular
Regular

Joined: Fri Dec 12, 2003 2:09 pm
Posts: 84
Location: San Francisco, USA
Our application processes large volumes of data. For certain batch processing use cases, we have found that the Session cache overhead required to flush() the cache before queries is prohibitively expensive. In the short run, we are fixing things by disabling auto-flush.

However, what we'd really like to do is use Hibernate to load data into the JVM without having persistent objects saved in any caches.

We imagine being able to load individual objects by ID and sets of objects via queries, passing a "no cache" flag to inform Hibernate that it need not concern itself with managing the persistent state of the returned Java objects. We get to continue to utilize the great/fast JDBC mapping and HQL functionality of Hibernate and don't pollute our code with lots of direct JDBC calls.

We also have some use cases where we delete objects via a query, which also triggers a flush. Here, we'd again like a "bypass cache" flag to inform Hibernate that it can bypass the cache and execute the query without flushing.

What do people think about this kind of thing? Perhaps trying to fit this functionality into the Session API is the wrong way to go about this - we'd be happy with some other route to performing bulk load and update operations without incurring the overhead of cache management.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Aug 31, 2004 3:48 am 
Hibernate Team
Hibernate Team

Joined: Tue Aug 26, 2003 12:50 pm
Posts: 5130
Location: Melbourne, Australia
(1) "no cache" is impossible, since Hibernate would fall over with a stack overflow when your object graph has circular references
(2) you can disable flushing using setFlushMode()
(3) see http://blog.hibernate.org for info on how to control the size of the session cache


Top
 Profile  
 
 Post subject:
PostPosted: Tue Aug 31, 2004 1:13 pm 
Regular
Regular

Joined: Fri Dec 12, 2003 2:09 pm
Posts: 84
Location: San Francisco, USA
Gavin, I'm skeptical. I feel like you're using the same kind of logic that doomed EJB to being so bloated for so long: the framework must do <all these things> for the user and absolutely positively cannot be reduced in complexity :)

It seems perfectly reasonable to use only a subset of the functionality for which Hibernate was primarily intended. How about suspending disbelief for a moment and considering how to make this work? How are objects that are uncached any different from regular old detached objects?

Let me assume the risk of an object graph circularity. I will guarantee you that there are no circularities - in some case, the objects I'm dealing with don't even have any associations.

The fundamental reason to bypass the cache is to improve performance. Adding to the cache, flushing the cache, evicting from the cache, etc., is not free. Cache management adds unnecessary overhead and complexity to the batch processing parts of our application. Hibernate seems to push us to using JDBC directly, which also adds complexity and introduces a data access dichotomy. I'd think Hibernate would encourage its users to avoid this, but in some cases you guys appear to recommend it.

Note that for the more traditional, UI-oriented areas of our app, the "2.5-layer" cache architecture is magnifique!


Top
 Profile  
 
 Post subject:
PostPosted: Tue Aug 31, 2004 2:38 pm 
Hibernate Team
Hibernate Team

Joined: Tue Aug 26, 2003 6:10 am
Posts: 8615
Location: Neuchatel, Switzerland (Danish)
read the blog - and call session.clear() some more!

Possibly use "select new X()" syntax to avoid putting stuff in the cache (but it also limit the update possibilities of these entities)

_________________
Max
Don't forget to rate


Top
 Profile  
 
 Post subject:
PostPosted: Tue Aug 31, 2004 3:01 pm 
Regular
Regular

Joined: Fri Dec 12, 2003 2:09 pm
Posts: 84
Location: San Francisco, USA
I read the blog, thanks. I don't understand why you guys are so insistent that everything must be cached.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Aug 31, 2004 3:05 pm 
Hibernate Team
Hibernate Team

Joined: Tue Aug 26, 2003 6:10 am
Posts: 8615
Location: Neuchatel, Switzerland (Danish)
because if we didn't:

x = session.load(Person.class, 42);
y = session.load(Person.class, 42);

x!=y and you would have duplicate objects with the same id loaded by the same session and this get worse when you have graphs of objects.

That is in ORM generally a bad idea - thus a session has a cache to make x==y.

That - and the normal efficency gains ;)

_________________
Max
Don't forget to rate


Top
 Profile  
 
 Post subject:
PostPosted: Tue Aug 31, 2004 3:12 pm 
Regular
Regular

Joined: Mon Feb 23, 2004 10:42 pm
Posts: 102
Location: Washington DC
Almost like there should be a HibernateLite!

The power of Hibernate is database independance! There should be a way to leverage this for doing batch and other bulk loading operations.

_________________
Matt Veitas


Top
 Profile  
 
 Post subject:
PostPosted: Tue Aug 31, 2004 3:40 pm 
Regular
Regular

Joined: Fri Dec 12, 2003 2:09 pm
Posts: 84
Location: San Francisco, USA
max wrote:
because if we didn't:

x = session.load(Person.class, 42);
y = session.load(Person.class, 42);

x!=y and you would have duplicate objects with the same id loaded by the same session and this get worse when you have graphs of objects.

That is in ORM generally a bad idea - thus a session has a cache to make x==y.


That would be fine for us, and in fact we can ensure it never happens. Our batch processing is not subject to the uncertainties of user interactions and we're not working with arbitrarily complex object graphs. And remember, I proposed that the Hibernate API be extended to explicitly bypass the Session cache.

I don't need any paternalistic care to ensure I don't have two Java objects representing the same database record. I am not worried about object graph circularities. Hibernate Lite -- when and where can I get it? :)

I understand Hibernate's emphasis on managed persistent data, I just think it could meet our needs (and reach a wider audience) without much work or any fundamental changes.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Aug 31, 2004 4:02 pm 
CGLIB Developer
CGLIB Developer

Joined: Thu Aug 28, 2003 1:44 pm
Posts: 1217
Location: Vilnius, Lithuania
It can be nice to have all features in single framework and update without load is very usefull, but it is very hard to understand this way to implement batch processing. As I understand performance is very important for you, so why do you want to do it on client ? Are you sure this problem exists ?


Top
 Profile  
 
 Post subject:
PostPosted: Tue Aug 31, 2004 7:11 pm 
Hibernate Team
Hibernate Team

Joined: Tue Aug 26, 2003 12:50 pm
Posts: 5130
Location: Melbourne, Australia
You would not think it was OK when Hibernate started to fall over with stack overflows because your object graph had circular references.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Sep 07, 2004 3:26 pm 
Regular
Regular

Joined: Fri Dec 12, 2003 2:09 pm
Posts: 84
Location: San Francisco, USA
baliukas wrote:
It can be nice to have all features in single framework and update without load is very usefull, but it is very hard to understand this way to implement batch processing. As I understand performance is very important for you, so why do you want to do it on client ? Are you sure this problem exists ?

I am not sure what you mean by "update without load"? We want to load without caching because we don't plan on updating the loaded records. Batch processing often involves pushing data through a series of steps (a data pipeline), which can mean reading lots of data that is not going to be updated.

I understand where the Hib team is coming from in believing that batch processing usually doesn't make sense in Java: why materialize data into the JVM, incurring network overhead and marshalling/demarshalling costs? This applies to any database application; it is not specific to Hibernate.

Whenever possible we try to formulate the problem so that it can be expressed in SQL, which is by far the fastest way to accomplish batch processing. However, not every problem can be reasonably expressed in SQL. In some situations we rely upon PL/SQL, but as I explained in email to Gavin, we still have good reasons to resort to Java batch processing in a few situations:

- We need to cleanse large batches of company names and addresses. Cleansing is accomplished with 3rd party cleansing engines, which are in some cases Java libs and other places C libs that provide JNDI apis. We could conceivably write Java stored procedures in Oracle to do this, but we are very leery of that approach (we ultimately want to run on DB2 and SqlServer, so we're trying to stay as vendor-neutral as possible). So, instead, we materialize records from the database into the JVM and cleanse from there.

- Similarly, we need to match records to one another. Again, we rely upon 3rd party matching engines. While the vendors all have different approaches, the problem again requires us to pass records into Java and/or C libraries. The algorithms that direct which records should be compared and in what order can be quite complex, which is all the more reason not to push this into (Java) stored procs in the database.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Sep 07, 2004 3:28 pm 
Regular
Regular

Joined: Fri Dec 12, 2003 2:09 pm
Posts: 84
Location: San Francisco, USA
I'd like to submit an enhancement request to JIRA regarding this topic to gauge how much interest there is from others in the community and give people a chance to vote. But I don't want to do that if Gavin's going to close it right away :)

So what do you guys say?


Top
 Profile  
 
 Post subject:
PostPosted: Tue Sep 07, 2004 3:40 pm 
Regular
Regular

Joined: Mon Feb 23, 2004 10:42 pm
Posts: 102
Location: Washington DC
Got my vote! I still believe that one of the powers of Hibernate is the database independence.

_________________
Matt Veitas


Top
 Profile  
 
 Post subject:
PostPosted: Tue Sep 07, 2004 7:21 pm 
Hibernate Team
Hibernate Team

Joined: Tue Aug 26, 2003 12:50 pm
Posts: 5130
Location: Melbourne, Australia
I still do not understand how this is different to:

Code:
session.createQuery(...).list();
session.clear();


Top
 Profile  
 
 Post subject:
PostPosted: Wed Sep 08, 2004 7:12 am 
CGLIB Developer
CGLIB Developer

Joined: Thu Aug 28, 2003 1:44 pm
Posts: 1217
Location: Vilnius, Lithuania
There is no meaning for this feature, if you copy data to collection yourself, it can be usefull if you are going to use cursor, but it must beter to implement it using some client independent way like stored procedure.


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 29 posts ]  Go to page 1, 2  Next

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.