I have a use case whereby I need to insert a 10 million records into a single (wide) table at one time each month [I can't change the schema]. Hibernate is perfect for the rest of the application, so I'm hoping to use it for this as well.
But the performance that I'm seeing with Hibernate is quite a bit slower then using straight JDBC. I know Hibernate is very performant, so I'm assuming that I need to change the mapping, configuration or session management and am hoping that someone might be able to provide some hints beyond what I've tried. In terms of performance, I'm concerned with the time it takes to do the inserts and am not worried about SessionFactory configuration time because it will likely be irrelevant given the quantity of inserts I need to do.
For reference I've spent quite a bit of time with the online docs and have read most of Christian Bauer and Gavin King's new book "Java Persistence with Hibernate".
I've tried several different styles and options with Hibernate and found that using the StatelessSession and batching the inserts provides the best performance, although other methods are close behind. Here's the code:
Code:
Configuration configuration = new Configuration().configure();
configuration.setProperty( "hibernate.jdbc.batch_size", batchSize );
configuration.setProperty( "hibernate.cache.use_second_level_cache", "false" );
SessionFactory sessionFactory = configuration.buildSessionFactory();
StatelessSession session = sessionFactory.openStatelessSession();
for( int i = 0; i < batches; i++ ) {
Transaction tx = session.getTransaction();
tx.begin();
for( int i = 0; i <= batchSize; i++ ) {
Object myObject = new ...
session.insert( myObject );
}
tx.commit();
}
session.close();
sessionFactory.close();
Next fastest is to use a regular session (cache off) and batch the inserts as described in "Java Persistence with Hibernate". Here's the code:
Code:
Configuration configuration = new Configuration().configure();
configuration.setProperty( "hibernate.jdbc.batch_size", batchSize );
configuration.setProperty( "hibernate.cache.use_second_level_cache", "false" );
SessionFactory sessionFactory = configuration.buildSessionFactory();
Session session = sessionFactory.openSession();
Transaction tx = session.getTransaction();
tx.begin();
for( int i = 0; i <= totalCycles; i++ ) {
Object myObject = new ...
session.save( myObject );
if( i % batchSize == 0 ) {
session.flush();
session.clear();
}
}
tx.commit();
session.close();
sessionFactory.close();
Next fastest is using the regular session and threading the batches. I did expect this to be faster then the above two, but the results could be hampered by a less-than-great client box. Here's the code:
Code:
ExecutorService executorService = Executors.newFixedThreadPool( numberOfThreads );
Configuration configuration = new Configuration().configure();
configuration.setProperty( "hibernate.jdbc.batch_size", batchSize );
configuration.setProperty( "hibernate.cache.use_second_level_cache", "false" );
SessionFactory sessionFactory = configuration.buildSessionFactory();
List tasks = new ArrayList();
for( int i = 0; i < batches; i++ ) {
tasks.add( new BatchCycleCallable() );
}
executorService.invokeAll( tasks );
sessionFactory.close();
executorService.shutdown();
private class BatchCycleCallable
implements Callable {
public Object call() throws Exception {
Session session = sessionFactory.openSession();
Transaction tx = session.getTransaction();
tx.begin();
for( int i = 0; i <= batchSize; i++ ) {
Object myObject = new ...
session.save( myObject );
}
tx.commit();
session.close();
return null;
}
}
I've also tried a few other styles, but they're all slower then those described above. I've also tried JPA versions, but have experienced even less performance.
I can't use HQL to insert the records because Hibernate only supports "INSERT ... SELECT" and none of the values that I need to insert exist in the DB already.
The XML configuration for the entity is pretty basic:
Code:
<class name="..." table="...">
<composite-id name="id" class="...">
<key-property name="..." column="..."/>
<key-property name="..." column="..."/>
<key-property name="..." column="..."/>
<key-property name="..." column="..."/>
</composite-id>
<property name="..." column="..."/>
<property name="..." column="..."/>
....and many more, but no collections or relationships...
</class>
Here's the hibernate.cfg.xml:
Code:
<hibernate-configuration>
<session-factory>
<property name="hibernate.connection.driver_class">com.microsoft.sqlserver.jdbc.SQLServerDriver</property>
<property name="hibernate.connection.url">...</property>
<property name="hibernate.connection.username">...</property>
<property name="hibernate.connection.password">...</property>
<property name="hibernate.dialect">org.hibernate.dialect.SQLServerDialect</property>
<property name="hibernate.c3p0.min_size">10</property>
<property name="hibernate.c3p0.max_size">50</property>
<property name="hibernate.c3p0.timeout">600</property>
<property name="hibernate.c3p0.max_statements">1000</property>
<property name="hibernate.c3p0.idle_test_period">3000</property>
<property name="show_sql">false</property>
<property name="hibernate.jdbc.batch_size">50</property>
<mapping resource="....hbm.xml"/>
</session-factory>
</hibernate-configuration>
I'm experiemented with different thread pool sizes (hibernate.c3p0.max_size) and statement cache sizes (hibernate.c3p0.max_statements), but they didn't effect the results very much within the ranges that I tried.
I'm using SQL Server 2000 on a pretty decent box. I'm using the latest version of Hibernate available: 3.2.1 on JDK 1.5.0_04 (Sun's Impl).
I know there's a ton of variables, but any assistance in how to improve Hibernate's performance would be appreciated.
As a reference, here's the jist of the JDBC code that I'm using to compare: It uses a PreparedStatement and executes the inserts in batches, in multiple threads. It gets it's connections from c3p0, which uses the identical config as with the Hibernate examples above. I've tweaked the number of threads and batchsize to get it pretty performant. Here's the code (not so pretty, but hopefully it will do for comparison sake):
Code:
ExecutorService executorService = Executors.newFixedThreadPool( numberOfThreads );
List tasks = new ArrayList();
for( int i = 0; i < batches; i++ ) {
tasks.add( new BatchCallable() );
}
executorService.invokeAll( tasks );
executorService.shutdown();
private class BatchCycleCallable
implements Callable {
public Object call() throws Exception {
Connection connection = null;
PreparedStatement preparedStatement;
boolean originalAutoCommitState = false;
try {
connection = ...getC3P0Connection();
originalAutoCommitState = connection.getAutoCommit();
connection.setAutoCommit( false );
preparedStatement = connection.prepareStatement( sql );
for( int i = 0; i <= batchSize; i++ ) {
preparedStatement.set...
preparedStatement.set...
preparedStatement.addBatch();
}
preparedStatement.executeBatch();
connection.commit();
} catch( Exception e ) {
e.printStackTrace();
if( connection != null ) {
connection.rollback();
}
} finally {
if( connection != null ) {
connection.setAutoCommit( originalAutoCommitState );
connection.close();
}
}
return null;
}
}
Thanks in advance.
-peter