-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 11 posts ] 
Author Message
 Post subject: How to make a very large query with limited RAM
PostPosted: Fri Dec 16, 2005 12:00 pm 
Regular
Regular

Joined: Fri May 13, 2005 4:08 pm
Posts: 64
What is the best way to iterate through a table of 60,000 records, where each record has a map to a few hundred other records, in order to process them one at a time?

In total, the 60,000 row table and the 42 million row table must be processed. Having NHibernate pull in all these records into memory at once overruns my 2GB RAM capacity (not surprisingly). I suspect lazy-loading the map to the larger table is only delaying the inevitable, because as I process each record the other table will get slowly loaded until it's fully in memory -- or I run out.

I wish there was a ISession.CreateQuery construct of some type that would allow me to pass in a delegate that would process each record as it came in, rather than NHibernate having to get ALL the records and return them ALL to me. Functionality like that of foreach in C#, where as soon as I am done with the current record, it is evicted.

The only idea I can think of to resort to is to not use NHibernate at all for this one data operation. To use the SqlDataReader myself and pull the various tables I need in as necessary. But I sure wish NHibernate could help me here.

If there is a way and I'm ignorant, please tell me how to do it.

Thanks.


Top
 Profile  
 
 Post subject:
PostPosted: Fri Dec 16, 2005 12:18 pm 
Senior
Senior

Joined: Wed Jun 15, 2005 4:17 am
Posts: 156
At any rate this processing is overkill. Even if you use just ADO.NET you still have to transfer a large amount of data from the database.

I think this huge processing is a very good candidate for a stored procedure, and do all the processing in the db server

Cheers,
Radu


Top
 Profile  
 
 Post subject:
PostPosted: Fri Dec 16, 2005 2:09 pm 
Contributor
Contributor

Joined: Thu May 12, 2005 8:45 am
Posts: 226
I absolutely agree.

I did a data processing app almost two years ago, using NHibernate 0.2. We had to have processing code run from a Windows Service, controlled by Remoting, so that it could run in the background. Once in a while there would be too much data and the service would die with an OutOfMemoryException (4 GB RAM). No, the problem was not a bug in NH which has since been fixed, it is a fundamentally bad choice in technology for the job at hand.

I would never do that again. ORM is not the solution for data processing. Use a stored procedure. This is truly what they are for.


Top
 Profile  
 
 Post subject:
PostPosted: Fri Dec 16, 2005 3:19 pm 
Regular
Regular

Joined: Fri May 13, 2005 4:08 pm
Posts: 64
Maybe "processing" wasn't the right word. I'm basically exporting my database into a data file format for researchers. So yes, I have to pull the data from the database. Stored procedures don't apply here.


Top
 Profile  
 
 Post subject:
PostPosted: Fri Dec 16, 2005 3:45 pm 
Contributor
Contributor

Joined: Thu May 12, 2005 8:45 am
Posts: 226
Your database product should have an export tool. Some databases also allow stored procedures to write to a file as well. This needs to be kept as close to the DB as possible, and ORM is definitely not the way to go. If you must do it in app code, I would use an IDbDataReader and just loop through it and write to a file stream.


Top
 Profile  
 
 Post subject:
PostPosted: Fri Dec 16, 2005 4:05 pm 
Regular
Regular

Joined: Fri May 13, 2005 4:08 pm
Posts: 64
It's a binary file format that requires a Win32 dll, along with some computation on the data to create some metadata at the same time. I certainly could use the IDbDataReader, and I think that would solve the memory and performance problems, but it'll be a shame to not be able to leverage the nice object model that NHibernate usually exposes the database with.


Top
 Profile  
 
 Post subject:
PostPosted: Sat Dec 17, 2005 10:00 am 
Newbie

Joined: Fri Dec 16, 2005 11:21 am
Posts: 7
It would be very nice to have an Enumerator property in ISession... just like iterator() in Hibernate. Try to check if you can create your own Session implementation with this property... in the get you would use the DataReader... Let me know if you tried and if it works...


Top
 Profile  
 
 Post subject: Workaround
PostPosted: Sun Dec 18, 2005 12:45 am 
Regular
Regular

Joined: Fri May 13, 2005 4:08 pm
Posts: 64
Well, I worked around the problem (for now) using this code:
Code:
      IEnumerable<Emar.Evaluating.Evaluation> IEmarManager.GetDataSetEvaluations(int questionnaireId)
      {
         System.Collections.IList evalIds =
            Session.CreateQuery("select e.Id from Evaluation e where e.IncludeInDataSet = 1 and e.Questionnaire.Id = ?")
            .SetInt32(0, questionnaireId)
            .List();
         foreach (int evalId in evalIds)
         {
            Evaluation eval = Get<Evaluation>(evalId);
            yield return eval;
            Session.Evict(eval);
         }
      }


This code gets a list of all primary keys, then gets each individual row, and then evicts it. This effectively cuts down on the amount of RAM consumed. However, RAM still creeps up, even if I use ISession.Clear(). This got the job done, but it still took ~1GB of RAM for the query.

Oh, and this is using C# 2.0's yield return construct.


Top
 Profile  
 
 Post subject:
PostPosted: Sun Dec 18, 2005 7:29 am 
Contributor
Contributor

Joined: Wed May 11, 2005 4:59 pm
Posts: 1766
Location: Prague, Czech Republic
giorgio wrote:
It would be very nice to have an Enumerator property in ISession... just like iterator() in Hibernate. Try to check if you can create your own Session implementation with this property... in the get you would use the DataReader... Let me know if you tried and if it works...


There is such a method, it's called Enumerable (the Java method is called iterate).


Top
 Profile  
 
 Post subject:
PostPosted: Mon Dec 19, 2005 1:31 pm 
Newbie

Joined: Fri Dec 16, 2005 11:21 am
Posts: 7
sergey wrote:
giorgio wrote:
It would be very nice to have an Enumerator property in ISession... just like iterator() in Hibernate. Try to check if you can create your own Session implementation with this property... in the get you would use the DataReader... Let me know if you tried and if it works...


There is such a method, it's called Enumerable (the Java method is called iterate).


I see it's not implemented yet... :oops: :? :(


Top
 Profile  
 
 Post subject:
PostPosted: Mon Dec 19, 2005 4:58 pm 
Contributor
Contributor

Joined: Thu May 12, 2005 8:45 am
Posts: 226
Where do you see that?


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 11 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.