-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 5 posts ] 
Author Message
 Post subject: Performance issue: BatchProcess, Huge object tree
PostPosted: Mon Jun 04, 2007 2:58 pm 
Newbie

Joined: Mon Jan 09, 2006 3:57 pm
Posts: 4
Hi all,

I am a big fan of Hibernate and I have suggested in my project that we use hibernate for a batch process. I wasnt sure, but i was keen that hibernate would ease life. And it did for a while.

However, problems started with increase in data volume. Allow me to explin in detail.

    The app is a Batch process invoked on Unix box as a java app (main class)
    The app uses a Single long lived session used by the single instance of dao.
    We have three classes mapped as shown below.


we get an xml from different sources that we process to make that tree.
now, initially (total object count ~600), we used to create the tree and save the Volume object. and everything was fine.
but when the tree size grew (total object count ~2500), we had to save the children first and then the parents because the process was taking too long to know if anything was happening at all.
Then came a BIG XML file with a total object count of 50k+. Now the session was going too fat and was take close to 3 secs for each insert after 3000 records.

I read somewhere that session.clear will remove all objects from session. make it light.

I wrote a dao interface method to clear session, but the application has to call it after every 1000 records manually. this is not elegant.

There is another problem too.. When I try to get something like

Code:
folder = dao.loadFolder(id);
files = folder.getFiles();
it = files.getIterator();


it takes more than 90 secs to load the iterator. I did give the batch size it seems to be ignored.

I have removed cascade altogether, but the effect may not be acceptable.

Here are my questions:

I know my implementation needs to be fine tuned.
1. for the save /update part, There should be an elegant way to limit the size of the session and remove the old objects automatically. or some pattern i could do to help solve it more elegantly.
2. loading the iterator may be improved by some method?

Hibernate version:3

Mapping documents:

Code:
<class name="Volume" table="VOL">
      <id name="id" column="ID" type="long">
         <generator class="sequence" >
            <param name="sequence">VOL_SEQ</param>
         </generator>
      </id>
      
      <bag name="FOL" cascade="all" inverse="true">
         <key column="VOL_ID" />
         <one-to-many class="Folder"/>
      </bag>
      ...
      ...
      ...
</class>

<class name="Folder" table="FOL">
      <id name="id" column="ID" type="long">
         <generator class="sequence" >
            <param name="sequence">FOL_SEQ</param>
         </generator>
      </id>
      
      <bag name="P3" cascade="all" inverse="true"  batch-size="200">
         <key column="FOL_ID" />
         <one-to-many class="File"/>
      </bag>
      ...
      ...
      ...
</class>

<class name="File" table="FIL">
      <id name="id" column="ID" type="long">
         <generator class="sequence" >
            <param name="sequence">FIL_SEQ</param>
         </generator>
      </id>
      
      ...
      ...
      ...
</class>

   





Name and version of the database you are using:Oracle 10G


Top
 Profile  
 
 Post subject:
PostPosted: Mon Jun 04, 2007 5:44 pm 
Newbie

Joined: Mon Jan 09, 2006 3:57 pm
Posts: 4
150 views and no reply? :(


Top
 Profile  
 
 Post subject:
PostPosted: Mon Jun 04, 2007 11:43 pm 
Expert
Expert

Joined: Tue Jan 30, 2007 12:45 am
Posts: 283
Location: India
Hi sarath_it,

Give us time taken by each layer ,SQL Execution Time ,SQL and Table .We will see what the best we could do .

But generally I don’t think that 50K record size in problem.

_________________
Dharmendra Pandey


Top
 Profile  
 
 Post subject:
PostPosted: Tue Jun 05, 2007 10:10 am 
Newbie

Joined: Mon Jan 09, 2006 3:57 pm
Posts: 4
Here is a sample test code.

Code:
         for (int j = 0; j < 50000; j++) {
            File f = new File(j);
            //set some attribs
            ...
            ...
            ...
            t = System.currentTimeMillis();
            FileDao.newFile(f,new Long(350), new Long(450));
            System.out.println( j +  "    " + (System.currentTimeMillis() - t));

         }

         
         Volume volume = volumeDao.loadById(new Long(450));
         List fdl = volume.getFolders();
         for (Iterator iter = dl.iterator(); iter.hasNext();) {
            Folder df = (Folder) iter.next();
            List cfl = df.getFiles();
            Iterator iterator = cfl.iterator();
            System.out.println( " Loaded iterator" + (System.currentTimeMillis() - t));
            int i=0;
            for (; iterator.hasNext();) {
               File cf = (File) iterator.next();
               Long id = cf.getId();
               t = System.currentTimeMillis();
               FileDao.updateStatus(id, Status.INIT);
               System.out.println( id +  "    " + (System.currentTimeMillis() - t));
               if (i%500 ==0) {
                  System.out.println("Clearing Session cache");
                  batchDao.flushDBResources();
               }
               i++;
            }
            
         }



in the first for loop I am trying to insert 50000 records.

the loop outputs something like

0 126
1 64
2 69
3 61
...
...
...
4500 2106
4501 2213
4502 2249
...
...


The second loop out puts.
0 6933
Clearing Session cache
1 47
2 45
3 49
4 42
..
...


Top
 Profile  
 
 Post subject:
PostPosted: Tue Jun 05, 2007 5:46 pm 
Expert
Expert

Joined: Sat Jan 17, 2004 2:57 pm
Posts: 329
Location: In the basement in my underwear
So the getFiles() is returning 50,000 records?

Here's a few things you can try:

-Set FlushMode.MANUAL if you're doing any lookups (i.e. I don't know what updateStatus does)
-Rather than call the getFiles() on the object. Execute a query using a scrollable result set to get the files. That way you won't immediately load 50,000 objects into your session.
-As you're scrolling through the result set do your data manipulation and every x records flush and CLEAR your session. I don't know what the flushDBResources() is doing but if you are simply flushing your session you're going to run into issues.

What is most likely happening is that you're loading a bunch of data into your session and then every time hibernate decides that it needs to flush it goes through every loaded entity and checks to see if the entity is dirty. That tends to chew a load of time as your session grows.

Alternatively you can try using a StatelessSession that doesn't do automatic dirty checking but you lose some features (Interceptors, event model, etc) I believe.


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 5 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.