Hi,
I am using BioJava to read protein files into a SQL Server database. Basically, the database and object structure is as follows: A protein file can have many (read hundreds of) 'model-chain-residue's , and a model-chain-residue can have many (read dozens to hundreds of) atoms.
I have the database structure in place to handle all of this, and I can load the protein data into SQL using Hibernate 3.2.5. The problem is as the object structure starts building up within a given session (for a given protein), the program slows down dramatically... the object hierarchy is getting very 'full'. Parsing and dumping a single 325 kb file into the database can take easily 20 minutes... and I have 4,000 protein files to load! :-) And a 325kb protein file is small compared to the larger ones! I will also run out of heap space I am reasonably sure.
Each of the SQL tables has an auto-increment field as the primary key. In the appropriate hbm.xml files I have the following settings:
Code:
...
<generator class="identity" />
...
<many-to-one name=... fetch="select" cascade="save-update">
Here is psuedo-code of what I have that DOES work, but slows considerably per protein as more and more ModelChainResidues and Atoms are added:
Code:
SessionFactory factory = HibernateUtil.getSessionFactory();
Session session = factory.getCurrentSession();
try {
Protein p = new Protein();
for (int i = 0; i < numOfModelsChainsResidues; i++) {
ModelChainResidue mcr = new ModelChainResidue(i+1);
mcr.setProtein(p);
p.getModelChainResidues().add(mcr);
...
for (j=0; j < numOfAtoms; j++) {
Atom a = new Atom();
a.setModelChainResidue(mcr);
mcr.getAtoms().add(a);
...
session.save(a);
session.flush();
}
}
}
What I would like to do (just to load the data into the database) is to dump the data in the database, then remove the atom that was just loaded from the object structure but NOT from the database... so that only one atom max is in the object structure at a time (with a similar thought towards ModelChainResidue). Yet when I use code like this:
Code:
SessionFactory factory = HibernateUtil.getSessionFactory();
Session session = factory.getCurrentSession();
try {
Protein p = new Protein();
...
for (int i = 0; i < numOfModelsChainsResidues; i++) {
ModelChainResidue mcr = new ModelChainResidue(i+1);
mcr.setProtein(p);
p.getModelChainResidues().add(mcr);
...
for (j=0; j < numOfAtoms; j++) {
Atom a = new Atom();
a.setModelChainResidue(mcr);
mcr.getAtoms().add(a);
...
session.save(a);
session.flush();
--> mcr.getAtoms().remove(a);
--> a.setModelChainResidue(null);
}
//I'd also like to have this kind of logic.
---> //p.getModelChainResidues().remove(mcr);
---> //mcr.setProtein(null);
}
}
... I (of course) get a NOT NULL exception. Specifically: "org.hibernate.PropertyValueException: not-null property references a null or transient value: data.entities.Atom.ModelChainResidue"
Any ideas how I can accomplish what I need to do?
Thanks! I'll bet it's fairly easy, but I've only been using Hibernate for a couple of days and it has me stumped!
Paul