We have one table that contains consumer records. There are going to be vastly more consumers than anything else in our system (millions). We load up consumer data sometimes in batches of hundreds of thousands.
Our current consumer mapping uses an id field like so:
Code:
<class name="com.nimblefish.core.data.Consumer">
<meta attribute="generated-class">com.nimblefish.core.dto.ConsumerDTO</meta>
<meta attribute="concrete-generated-class">true</meta>
<id name="id" type="long" unsaved-value="null" >
<generator class="native"/>
</id>
<property name="prefix" type="string"/>
<property name="fname" type="string"/>
<property name="lname" type="string"/>
...
</class>
And our saving code (this is just a prototype!) is:
Code:
for (int i = 0; i < numRecordsToHandle; i++) {
Vector record = reader.read();
Consumer consumer = new Consumer(list);
consumer.setPrefix((String)record.get(0));
consumer.setFname((String)record.get(1));
consumer.setLname((String)record.get(2));
...
// one database hit per consumer?! sigh....
// will this batch?! how do we tell?
session.save(consumer);
Clearly this calls out for some kind of optimized JDBC.
Or does it? What's the best thing to do here? Should we be doing our own ID assignment, and just doing massive bulk JDBC inserts?
It's our impression that with Postgres and native id generation, each session.save is doing at least two JDBC calls (to get a new sequence value and to insert the new consumer record), and possibly more. Is there a more efficient ID generator for this case?
Is there anything else we should be aware of if we wind up doing our own ID assignment? I envision some strategy such as allocating IDs in blocks of a thousand (in a separate transaction so as not to lock the consumer-ID table for the whole duration of the insert), and then just doing a mass set of inserts. If an insert block fails, that block of IDs might get lost since more IDs might have been allocated in the meantime (we are going to be doing an awful lot of consumer loading in our job handling system), and it would be a lot of trouble to track allocation. Are there any other good patterns here?
Cheers!
Rob