-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 8 posts ] 
Author Message
 Post subject: Bulk Insert (multi-row Insert)
PostPosted: Tue Jul 25, 2006 4:01 pm 
Newbie

Joined: Mon Jun 26, 2006 10:39 am
Posts: 5
Hi,


My project's object model persistence is implemented using Hibernate. Now we are developing an import layer on top of the object model to import different kinds of raw data into object model db.

This requires persisting lots of objects ( rows) that may go up to 75 million rows. So I used hibernate batch insert concept (save it to session and flust/commit to db for every 5000 objects). This improved the performance but still not in acceptable limits.

I am using MYSQL as my database and MYSQL performance tuning suggests that multi-row inserts in one single sql statment (bulk insert) is lot faster than multiple single insert statements.

How can I configure bulk insert in Hibernate? I had read couple of posts but not completely clear whether it is supported or not or does Hibernate team has any ideas to support this?

If not this, is there anyway I can improve the performance of inserts using Hibernate as persistence layer?

_________________
Thanks,
Amala


Top
 Profile  
 
 Post subject:
PostPosted: Tue Jul 25, 2006 5:11 pm 
Newbie

Joined: Fri Jun 30, 2006 5:30 pm
Posts: 13
Here's a couple of ideas for you:

-StatelessSession can be a bit faster than a normal Session, you should give that a shot
-Are you persisting objects which have Hibernate managed relationships? If so, this may be preventing the batching from working properly.
-Are you sure auto-commit is turned off on the db?
-Try flushing every 5k but committing every 50k or 100k.
-Are you using a generated sequence? If so, you will need to use a caching sequence strategy
-Is your app multi-threaded? You will see better performance if you can run multiple threads at once.

What kind of throughput are you seeing?


Top
 Profile  
 
 Post subject:
PostPosted: Wed Jul 26, 2006 4:07 am 
Beginner
Beginner

Joined: Mon Jul 03, 2006 5:40 am
Posts: 20
Location: Russia
I also need to create many objects very fast. these objects should be linked to other objects mapped to tables. if the linked entity does not exist, I'm creating a new entity, and then create my main object.
so, I'm trying to use "batch" mode to speed up the process. I'm using this code:
Code:
Session session = HibernateUtil.getSessionFactory().getCurrentSession();
      session.beginTransaction();
..............
TestBean test = null;
try {
   test = DBManager.getTest(session,
      suiteBean.getId(), testName);
} catch (ObjectNotFoundException ex1) {
   test = new TestBean(testName);
   session.save(test);
   suiteBean.getTests().add(test);
   session.update(suiteBean);
}
... some code to create "testResultBean" object and linked entities....

session.save(testResultBean);
if (number % JDBC_BATCH_SIZE == 0) {
                           // flush a batch of inserts and release memory:
    session.flush();
    session.clear();
}
...........
session.getTransaction().commit();


and that's what I have on Hibernate 3.1:
Quote:
Found two representations of same collection: com....SuiteBean.tests
org.hibernate.HibernateException: Found two representations of same collection: com.....SuiteBean.tests
at org.hibernate.engine.Collections.processReachableCollection(Collections.java:153)
at org.hibernate.event.def.FlushVisitor.processCollection(FlushVisitor.java:37)
at org.hibernate.event.def.AbstractVisitor.processValue(AbstractVisitor.java:101)
at org.hibernate.event.def.AbstractVisitor.processValue(AbstractVisitor.java:61)
at org.hibernate.event.def.AbstractVisitor.processEntityPropertyValues(AbstractVisitor.java:55)
at org.hibernate.event.def.DefaultFlushEntityEventListener.onFlushEntity(DefaultFlushEntityEventListener.java:124)
at org.hibernate.event.def.AbstractFlushingEventListener.flushEntities(AbstractFlushingEventListener.java:195)
at org.hibernate.event.def.AbstractFlushingEventListener.flushEverythingToExecutions(AbstractFlushingEventListener.java:76)
at org.hibernate.event.def.DefaultAutoFlushEventListener.onAutoFlush(DefaultAutoFlushEventListener.java:35)
at org.hibernate.impl.SessionImpl.autoFlushIfRequired(SessionImpl.java:954)
at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1526)
at org.hibernate.impl.CriteriaImpl.list(CriteriaImpl.java:283)
at org.hibernate.impl.CriteriaImpl.uniqueResult(CriteriaImpl.java:305)


the problem dissapears if I remove
Code:
    session.clear();

but then every next N rows insertion speed becomes slower and slower:
e.g. first 50 entities are created in 0.2 sec,
then next 50 entities - 0.4 sec,
then 0.7, etc. the time grows up significantly!
I have to create about 2000 objects in one working cycle, so without clear() operation it works VERY (!) slow when the row index becomes more than, say, 500-800.

I'm using session.flush() to use JDBC batch mode effectively, as supposed in one of the tutorials. when ~50 objects are created in JVM, I'm using flush(), so they are written to DB. it works fine, and the problem is with session.clear() at this moment.
I have read about Session.setFlushMode(), but I don't want to change the default mode.

note: without clear() method, everything works fine for me, but slower and slower with each 100 results. and with clear() I get this Exception.
I think I need to find the root problem of that exception.

I know that another option could be using a stateless session, but the documentation says:
Quote:
Collections are ignored by a stateless session.

so, if I try to work with linked objects, I get another error:
Code:
suiteBean.getTests().add(test); 

Code:
failed to lazily initialize a collection of role: com....beans.SuiteBean.tests, no session or session was closed

org.hibernate.LazyInitializationException: failed to lazily initialize a collection of role: com....beans.SuiteBean.tests, no session or session was closed

at org.hibernate.collection.AbstractPersistentCollection.throwLazyInitializationException(AbstractPersistentCollection.java:358)
at org.hibernate.collection.AbstractPersistentCollection.throwLazyInitializationExceptionIfNotConnected(AbstractPersistentCollection.java:350)
at org.hibernate.collection.AbstractPersistentCollection.initialize(AbstractPersistentCollection.java:343)
at org.hibernate.collection.AbstractPersistentCollection.write(AbstractPersistentCollection.java:183)
at org.hibernate.collection.PersistentSet.add(PersistentSet.java:165)
at com....harvester.Harvester.loadXMLFile(Harvester.java:226)
at com....harvester.Harvester.loadFiles(Harvester.java:369)
at com....harvester.Harvester.<init>(Harvester.java:346)
at com....harvester.Harvester.main(Harvester.java:393)


does this mean that I can't use stateless session in my case and should revert back to stateful and fight with that "Found two representations of same collection" error?


Top
 Profile  
 
 Post subject:
PostPosted: Wed Jul 26, 2006 10:21 am 
Newbie

Joined: Mon Jun 26, 2006 10:39 am
Posts: 5
Found two representations of same collection: com....SuiteBean.tests
org.hibernate.HibernateException: Found two representations of same collection: com.....SuiteBean.tests
at org.hibernate.engine.Collections.processReachableCollection(Collections.java:153)
at org.hibernate.event.def.FlushVisitor.processCollection(FlushVisitor.java:37)
at org.hibernate.event.def.AbstractVisitor.processValue(AbstractVisitor.java:101)
at org.hibernate.event.def.AbstractVisitor.processValue(AbstractVisitor.java:61)
at org.hibernate.event.def.AbstractVisitor.processEntityPropertyValues(AbstractVisitor.java:55)
at org.hibernate.event.def.DefaultFlushEntityEventListener.onFlushEntity(DefaultFlushEntityEventListener.java:124)
at org.hibernate.event.def.AbstractFlushingEventListener.flushEntities(AbstractFlushingEventListener.java:195)
at org.hibernate.event.def.AbstractFlushingEventListener.flushEverythingToExecutions(AbstractFlushingEventListener.java:76)
at org.hibernate.event.def.DefaultAutoFlushEventListener.onAutoFlush(DefaultAutoFlushEventListener.java:35)
at org.hibernate.impl.SessionImpl.autoFlushIfRequired(SessionImpl.java:954)
at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1526)
at org.hibernate.impl.CriteriaImpl.list(CriteriaImpl.java:283)
at org.hibernate.impl.CriteriaImpl.uniqueResult(CriteriaImpl.java:305)


The above exception looks like an exception that should get when querying, not while committing, so this might be raised from DBManager.getTest method.

When the session got cleared using clear() method, it might be requesting db for the collection of tests and found some how some duplicate collections and you were requesting a uniqueResult. I think you need to check your 'getTest' method and the relations.

I hope this helps.

_________________
Thanks,
Amala


Top
 Profile  
 
 Post subject:
PostPosted: Thu Jul 27, 2006 12:39 am 
Beginner
Beginner

Joined: Mon Jul 03, 2006 5:40 am
Posts: 20
Location: Russia
yes, this happens when the call to uniqueResult() is performed, and the cache is automatically flushed.
here's the code for DBManager.getTest():
Code:
   public static TestBean getTest(Session session, Long suiteId,
         String testName) throws ObjectNotFoundException {
      
      Criteria criteria = session.createCriteria(TestBean.class)
          .createAlias("suites", "suite")
          .add(Expression.eq("suite.id", suiteId))
                    .add(Expression.eq("name", testName));

      TestBean bean = (TestBean) criteria.uniqueResult();
      if (bean == null) {
         throw new ObjectNotFoundException(testName, TestBean.class
               .getName());
      }
      return bean;
   }

Quote:
When the session got cleared using clear() method, it might be requesting db for the collection of tests and found some how some duplicate collections and you were requesting a uniqueResult.

I don't understand how it can happen - WHY does it find two collections? it looks like a real bug.

back to stateless session - do I have to keep off using it if I need to update collections (assign tests to suites in my case)?


Top
 Profile  
 
 Post subject:
PostPosted: Thu Jul 27, 2006 9:58 am 
Newbie

Joined: Mon Jun 26, 2006 10:39 am
Posts: 5
Hi alskar,

Here we are diverting the topic from original question. Can you please create new post for your problem to discuss further?


Top
 Profile  
 
 Post subject:
PostPosted: Fri Jul 28, 2006 1:11 am 
Beginner
Beginner

Joined: Mon Jul 03, 2006 5:40 am
Posts: 20
Location: Russia
well, I think these are the typical problems, which you have when you try to perform a quick mass-insert.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Jul 31, 2006 4:27 pm 
Newbie

Joined: Fri Jun 30, 2006 5:30 pm
Posts: 13
Did you try any of my suggestions? We just went through the same thing and those were what we found to help.


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 8 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.