-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 21 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: Using Hibernate to do updates on large amounts of data...
PostPosted: Wed Mar 17, 2004 7:04 am 
Beginner
Beginner

Joined: Fri Jan 02, 2004 7:07 pm
Posts: 35
I'm running Hibernate 2.1 on Sybase ASE 12 and am having problems getting performance up on a batchlike update operation.

I do a find() and fetch about 90K of instances to a List and then iterate the List and update properties on about 25% of them and everything seems fine and speedy (when using update() ) up until I do the actual flush(). The flushing can take up to half an hour to do for 10k instances! I've tried flushing on every 1000 instances or on every 10k instances but it's still dog slow.

I am using the EHCache implementation and I have cache-enabled the class in the List (with a read-write strategy).

My general session structure look like this:

tx = session.beginTransaction();

session.find() (and inserted in List)

Iterate List and update properties in some instances.
session.update(changedInstance)

session.flush() (fex every 1000/10K instances)

tx.commit();

I suspect that it MAY be a latency issue since the DB server is based in another country right now. But do latency really affect *that* much or is it also a Hibernate related issue?

Basically am I using the correct approach for this kind problem?

An obvious solution would be to go core JDBC/SQL (and let the DB do the job) for this but I use a couple of Hibernate mapped classes to update the properties so I would like to keep it all Hibernate if possible...

Regards

Jonas


Top
 Profile  
 
 Post subject:
PostPosted: Wed Mar 17, 2004 7:55 am 
Hibernate Team
Hibernate Team

Joined: Tue Aug 26, 2003 12:50 pm
Posts: 5130
Location: Melbourne, Australia
Quote:
I suspect that it MAY be a latency issue since the DB server is based in another country right now.


A quick order-of-magnitude estimate shows this to be not unreasonable. Suppose that latency is 1/5th of a second. Not unreasonable, I suppose. Then we would expect 1.25 hours for 90k objects.

Why don't you try the same thing in direct JDBC and see how long it takes. (P.S. it is not really best to hold all that stuff in memory at once - 90k instances in a single session, for an hour, is a huge memory usage.)


Top
 Profile  
 
 Post subject:
PostPosted: Wed Mar 17, 2004 10:24 am 
Beginner
Beginner

Joined: Fri Jan 02, 2004 7:07 pm
Posts: 35
Thats true. That latency adds upp if every update is done *individually*. But when used like this one would expect the Hibernate cache to maybe optimize and batch the updates when flushed? Or could that maybe be configured in some way?

We have another core JDBC based product here and those guys have used the JDBC Statment Interface batch stuff to queue up SQL and then batch them to gain performance. And it seems to work great for them...

I may have to try to test with a JDBC batch implementation also and compare...


Top
 Profile  
 
 Post subject:
PostPosted: Wed Mar 17, 2004 10:36 am 
Hibernate Team
Hibernate Team

Joined: Tue Aug 26, 2003 12:50 pm
Posts: 5130
Location: Melbourne, Australia
Are JDBC batch updates enabled in Hibernate?

Look at the startup log.


Top
 Profile  
 
 Post subject:
PostPosted: Wed Mar 17, 2004 11:40 am 
Beginner
Beginner

Joined: Fri Jan 02, 2004 7:07 pm
Posts: 35
That's spectacular Gavin. Thanks for answering. I definately missed that property. When I enable it and set the batch size to 1K i get more than double the performance (from 3 updates/sec upp to 8 updates/sec) which is good, but we are still missing an order of magnitude :(

When batching via raw JDBC from another part of the app we get somewhere between 150-200 inserts per/sec. to the same site (but a different DB instance).

If it isn't a latency problem could it be a memory problem with the objects in the cache? I am running the batch from a servlet deployed locally on Tomcat (my dev environment , production is WLS7) and I have upped the mem options for the Tomcat JVM to 512MB in catalina.bat

Regards

Jonas


Top
 Profile  
 
 Post subject:
PostPosted: Wed Mar 17, 2004 5:50 pm 
Beginner
Beginner

Joined: Fri Jan 02, 2004 7:07 pm
Posts: 35
In my usual style I keep having problems (*nothing* is ever easy for me, maybe I'm just plain stupid...)

I am trying to figure out if it's my ehcache settings that need to be tweaked to get some more juice out of the batching so I have added the following ehcache.xml to my classpath. The cache name is the same as the name of the domain object I want to cache and have in my List to iterate over.

Code:
<cache name="com.lehman.mis.model.TradeFinal"
   maxElementsInMemory="50000"
   eternal="false"
   timeToIdleSeconds="900"
   timeToLiveSeconds="3600"
   overflowToDisk="false"
/>


And when starting I get the following exception when my HibernateUtil class tries to configure the cache:

java.lang.NoSuchMethodError
at net.sf.ehcache.CacheException.<init>(CacheException.java:97)
at net.sf.ehcache.CacheManager.configure(CacheManager.java:156)
at net.sf.ehcache.CacheManager.<init>(CacheManager.java:127)
at net.sf.ehcache.CacheManager.create(CacheManager.java:179)
at net.sf.ehcache.CacheManager.getInstance(CacheManager.java:195)
at net.sf.ehcache.hibernate.Plugin.<init>(Plugin.java:92)
at net.sf.ehcache.hibernate.Provider.buildCache(Provider.java:89)
at net.sf.hibernate.cfg.Configuration.configureCaches(Configuration.java:1067)
at net.sf.hibernate.cfg.Configuration.buildSessionFactory(Configuration.java:738)
at com.lehman.mis.utils.HibernateUtil.<clinit>(HibernateUtil.java:28)
at com.lehman.mis.servlet.DataHandler.UpdateCommissionOnZeroTrades(DataHandler.java:1116)
at com.lehman.mis.servlet.DataHandler.createReport(DataHandler.java:694)
at com.lehman.mis.servlet.MISServlet.running(MISServlet.java:412)
at com.lehman.mis.servlet.FeedReader.run(FeedReader.java:73)
at com.lehman.mis.servlet.MISServlet.init(MISServlet.java:69)
at org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:935)
at org.apache.catalina.core.StandardWrapper.load(StandardWrapper.java:823)
at org.apache.catalina.core.StandardContext.loadOnStartup(StandardContext.java:3422)
at org.apache.catalina.core.StandardContext.start(StandardContext.java:3623)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1188)
at org.apache.catalina.core.StandardHost.start(StandardHost.java:754)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1188)
at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:363)
at org.apache.catalina.core.StandardService.start(StandardService.java:497)
at org.apache.catalina.core.StandardServer.start(StandardServer.java:2190)
at org.apache.catalina.startup.Catalina.start(Catalina.java:512)
at org.apache.catalina.startup.Catalina.execute(Catalina.java:400)
at org.apache.catalina.startup.Catalina.process(Catalina.java:180)
at java.lang.reflect.Method.invoke(Native Method)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:203)

My util class for Hibernate follows the ThreadLocal pattern suggested on the site.

Could this be a thread related problem or have I totally missed something in ehcache.xml? Is it me or is it strange that I get an Exception in the ehcache exception class?


Top
 Profile  
 
 Post subject:
PostPosted: Thu Mar 18, 2004 3:54 am 
Hibernate Team
Hibernate Team

Joined: Thu Dec 18, 2003 9:55 am
Posts: 1977
Location: France
threadlocal and cache works perfectly together


i have this:
ehcache.xml
Code:
<ehcache>

    <!-- Sets the path to the directory where cache .data files are created.

         If the path is a Java System Property it is replaced by
         its value in the running VM.

         The following properties are translated:
         user.home - User's home directory
         user.dir - User's current working directory
         java.io.tmpdir - Default temp file path -->
    <diskStore path="java.io.tmpdir"/>


    <!--Default Cache configuration. These will applied to caches programmatically created through
        the CacheManager.

        The following attributes are required for defaultCache:

        maxInMemory       - Sets the maximum number of objects that will be created in memory
        eternal           - Sets whether elements are eternal. If eternal,  timeouts are ignored and the element
                            is never expired.
        timeToIdleSeconds - Sets the time to idle for an element before it expires. Is only used
                            if the element is not eternal.
        timeToLiveSeconds - Sets the time to idle for an element before it expires. Is only used
                            if the element is not eternal.
        overflowToDisk    - Sets whether elements can overflow to disk when the in-memory cache
                            has reached the maxInMemory limit.

        -->
    <defaultCache
        maxElementsInMemory="10000"
        eternal="false"
        timeToIdleSeconds="120"
        timeToLiveSeconds="120"
        overflowToDisk="true"
        />
       
    <cache name="com.auchan.protoj2ee.bo.Reception"
        maxElementsInMemory="300"
        eternal="false"
        timeToIdleSeconds="300"
        timeToLiveSeconds="600"
        overflowToDisk="true"
   />

...
</ehcache>



hibernate.cfg.xml
Code:
<property name="hibernate.cache.provider_class">net.sf.ehcache.hibernate.Provider</property>


mapping file.hbm.xml
Code:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE hibernate-mapping PUBLIC "-//Hibernate/Hibernate Mapping DTD 2.0//EN" "http://hibernate.sourceforge.net/hibernate-mapping-2.0.dtd">
<hibernate-mapping>
   <class name="com.auchan.protoj2ee.bo.Reception" table="REC_RECEPTION">
   <cache usage="read-write" />
      <id column="ID_RECEPTION" name="receptionId" >
         <generator class="sequence">
            <param name="sequence">S_REC_RECEPTION</param>
         </generator>         
      </id>
.....

   </class>
</hibernate-mapping>



if this can help you...


Top
 Profile  
 
 Post subject:
PostPosted: Thu Mar 18, 2004 5:52 am 
Beginner
Beginner

Joined: Fri Jan 02, 2004 7:07 pm
Posts: 35
I must stop working late nights...

I tried to configure the cache with a malformed ehcache.xml file. And I did not have the DefaultCache configuration there either which I expect can be a problem.

Now, the cache configures just fine but I get no performance increase for the batching at all so I guess that this is not the issue.

I'm running out of options here. I may have to go back to pure JDBC for this if I can't get closer to the performance my colleauge gets.

Just to recap if anyone cares:

1. I'm using find() to populate a List with 90K of mapped objects. The object has read-write cache configured.

2. I iterate the objects in the list and update about 25% of them. I evict() the ones I don't update from the session cache.

3. *Occationally* (about 5-10 times for the transaction) i execute a find() for another mapped class to get some parameters to my calculations in the primary object. Does this maybe flush() the cache to not get stale data? Even so it should not decrease the performance an order of magnitude...

4. I update() the trade I have changed in the iteration.

5. Every 1K updates I flush() the session.

Batching (hibernate.jdbc.batch_size) is configured to 1000 in cfg.xml
My DB host have a roundtrip latency (ping) of 350-500ms so I really need the batching to work.

My colleauge gets 150-200 inserts/sec with the JDBC batch implementation and similar latency figures. I get 5-8 updates/Sec... This means that every flush() pretty consistently takes about 2 minutes right now.

This is my ehcache.xml

Code:
<ehcache>

    <!-- Sets the path to the directory where cache .data files are created.

         If the path is a Java System Property it is replaced by
         its value in the running VM.

         The following properties are translated:
         user.home - User's home directory
         user.dir - User's current working directory
         java.io.tmpdir - Default temp file path -->
    <diskStore path="java.io.tmpdir"/>


    <!--Default Cache configuration. These will applied to caches programmatically created through
        the CacheManager.

        The following attributes are required for defaultCache:

        maxInMemory       - Sets the maximum number of objects that will be created in memory
        eternal           - Sets whether elements are eternal. If eternal,  timeouts are ignored and the element
                            is never expired.
        timeToIdleSeconds - Sets the time to idle for an element before it expires. Is only used
                            if the element is not eternal.
        timeToLiveSeconds - Sets the time to idle for an element before it expires. Is only used
                            if the element is not eternal.
        overflowToDisk    - Sets whether elements can overflow to disk when the in-memory cache
                            has reached the maxInMemory limit.

        -->
    <defaultCache
        maxElementsInMemory="10000"
        eternal="false"
        timeToIdleSeconds="120"
        timeToLiveSeconds="120"
        overflowToDisk="true"
        />
       
    <cache name="com.lehman.mis.model.TradeFinal"
        maxElementsInMemory="50000"
        eternal="false"
        timeToIdleSeconds="3600"
        timeToLiveSeconds="3600"
        overflowToDisk="true"
   />
</ehcache>


My mem options is set to 512 MB for the JVM Tomcat runs on and I don't get any memory errors when running.

The session is managed with the ThreadLocal pattern and I wrap all the above in a Transaction i get from the session (tx = session.BeginTransaction(), tx.commit())

Oh, and the servlet I am running from implements SingleThreadModel. Since I basically suck at threading I have no idea what implications this has for Hibernate but it needs to be there to stop the appserver from spawning multiple threads an thus multiple batch processes.
[/quote]


Top
 Profile  
 
 Post subject:
PostPosted: Thu Mar 18, 2004 6:56 am 
Hibernate Team
Hibernate Team

Joined: Tue Aug 26, 2003 12:50 pm
Posts: 5130
Location: Melbourne, Australia
Quote:
Does this maybe flush() the cache to not get stale data?


Probably, try setFlushMode(FlushMode.COMMIT).

Disable ehcache completely. You don't need it, right?

Use a completely new session for each batch.

You should not see a significant difference between Hibernate and direct JDBC under these conditions. Especially not over a slow link.


Top
 Profile  
 
 Post subject:
PostPosted: Thu Mar 18, 2004 1:01 pm 
Beginner
Beginner

Joined: Fri Jan 02, 2004 7:07 pm
Posts: 35
I disabled EHcache by removing the hibernate.cache.provider_class property from cfg.xml, removing the cache property from the domain object mapping and removing the ehcache.xml config file from the classpath.

Since I restart Tomcat every time I do the batch right now and I use the Threadlocal pattern to get currentSession() before the transaction it seems to me I get a fresh session every time I restart.

I have enabled batching in cfg.xml and I enable FlushMode.COMMIT right after I get the session.

cfg.xml looks like this:

Code:
<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE hibernate-configuration
    PUBLIC "-//Hibernate/Hibernate Configuration DTD//EN"
    "http://hibernate.sourceforge.net/hibernate-configuration-2.0.dtd">

<hibernate-configuration>

    <session-factory>
        <property name="dialect">net.sf.hibernate.dialect.SybaseDialect</property>
        <property name="hibernate.default_schema">dbo</property>
        <property name="connection.driver_class">com.sybase.jdbc2.jdbc.SybDriver</property>
        <property name="connection.username">un</property>
        <property name="connection.password">pw</property>
        <property name="connection.url">jdbc:sybase:Tds:DBserver</property>
        <property name="hibernate.jdbc.batch_size">1000</property>

mappings....

    </session-factory>

</hibernate-configuration>



I've tried both a direct JDBC connection (as above) and a DBCP JNDI DataSource I have configured in server.xml (that has been working great up until now) .

Sadly, I still get exactly the same performance. It's one of these times when you start to suspect that I am not even deploying to the right place since the performance don't seem to get affected whatever I do. But I have checked and re-checked and done clean installs and it seems to pick up other changes I do...

Frustrating to say the least.

BTW I use the JConnect5.5 driver that according to Sybase supports my DB server 100%.

I think I have have just officially run out of things to try. Thanks Gavin for trying to help me but this is probably more of a stupid user error than anything else. If I ever find out why this does not work I'll get back to this thread...


Top
 Profile  
 
 Post subject: Re: Using Hibernate to do updates on large amounts of data..
PostPosted: Sat Mar 20, 2004 6:04 pm 
Proxool Developer
Proxool Developer

Joined: Tue Aug 26, 2003 10:42 am
Posts: 373
Location: Belgium
jonteponte wrote:
My general session structure look like this:

tx = session.beginTransaction();

session.find() (and inserted in List)

Iterate List and update properties in some instances.
session.update(changedInstance)

session.flush() (fex every 1000/10K instances)

tx.commit();


If your code still looks as you decribe above, try to refactor it as follows (explanations follow):
Code:
tx = session.beginTransaction();

session.find() (and inserted in List)

Iterate List and update properties in some instances.

tx.commit();


1/
If the instance you update are loaded by the previous find, then they are associated with the current session. No need to explicitly tell Hibernate to update() them - it will detect the changes automatically at flush time.

2/
Don't flush periodically inside your loop - let Hibernate decide when it must be done (when tx.commit or before HQL queries).
Why? Because flushing requires time proportional to the number entities * number of properties (see 10th post in thread http://forum.hibernate.org/viewtopic.php?t=929064)
So, better to do it only once for all at the end of your updates.

But still... this unique flush on 90K instances, will take time - and look at your local CPU, it will be high - this is Hibernate iterating through the entities and their properties, looking for changes...

The only solution is too reduce the amount of objects you hold in the session. You could clear() the session at some points in the process...


Top
 Profile  
 
 Post subject:
PostPosted: Tue Mar 23, 2004 4:14 pm 
Beginner
Beginner

Joined: Fri Jan 02, 2004 7:07 pm
Posts: 35
Thanks for picking this up again!

I tried the above suggestion and the part where it loads/iterates/updates my 90K object is fast (40 seconds for the find() and another 30 secs or so of iterating/updating). But when I try to commit the transaction it grinds to a total stop. I've waited 1 hour and I get nothing. I get nothing else either. No logs, no memory errors. Nothing. When enabling show_sql everything seems fine until the end where Hibernate executes the correct update query on my large table. But it stalls there it seems.

It's obvious that this task is to big for my configuration which is:

- Tomcat 4.1.29 (jvm max 512MB). 1GB of mem on machine (1.4GHz P4)
- Sybase ASE 12 on Solaris.
- JDK 1.3.1_06

How can I divide this task into smaller chunks? Can I keep the instances in the list from the first find() and then iterate a chunk of them in separate transactions/sessions in some way so I don't have to keep all of the objects in the session? When you say I could clear() the session wouldn't that mean I loose the instances in my List?

I am going to try the query interface setFirstResult() and setMaxResult() and fetch discrete parts of the table at a time and do every one of those batches in a separate transaction/session.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Mar 23, 2004 6:04 pm 
Beginner
Beginner

Joined: Fri Jan 02, 2004 7:07 pm
Posts: 35
I tried the approach hinted at above where I extract chunks of data and wrap that iteration/update in separate Transactions. After the commit() i clear() the session before going back to fetch the next chunk of data.

In the beginning this is quite fast (5K instances in 10 s). Looking at the memory consumption it just seems to go up all the time from 150MB and then starting to hit the roof of 512MB somewhere in the middle of the 90K instances. And there it all grinds to a halt. Again.

The question is. Where is my memory leak? I clear() the session on every iteration of 5K instances. Does the initial query still hold and allocate memory for a resultset for the objects I am not interested in (fex the first 5K objects when I am fetching the second batch of objects from 5K to 10K?

It certainly seems so...


Top
 Profile  
 
 Post subject:
PostPosted: Tue Mar 23, 2004 6:25 pm 
Beginner
Beginner

Joined: Fri Jan 02, 2004 7:07 pm
Posts: 35
I just thought of something.

Do I need to close the session and get a new one from the Factory to start with a "clean" session on each batch?

I.e does my current approach maybe leave stuff in the session that mounts up even though I clear() on every 5K batch?


Top
 Profile  
 
 Post subject:
PostPosted: Wed Mar 24, 2004 5:26 am 
Beginner
Beginner

Joined: Fri Jan 02, 2004 7:07 pm
Posts: 35
After reading some JDBC docs and discussing this with my colleauges we have concluded that what probably happens is that the underlying resultset accesses and allocates memory for all instances up until the ones we are interested in even though we do not want to access them.

That pretty much makes Hibernate not useful in this context. I am going to try a direct JDBC approach and see if I can get more control that way. Or maybe I just have to surrender and put in the hours I will need to design a complicated SQL query that does it all. That IS what databases are built for so maybe my current efforts have just been futile...

But I suck at SQL so it's not my dream scenario.


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 21 posts ]  Go to page 1, 2  Next

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.