Performance problems with large 1-m Collections

tengvig · **Joined:** Wed Dec 17, 2003 10:01 am **Posts:** 4

I'm currently using OJB as O-R bridge, but want to switch to Hibernate because of the richer feature set.

Before switching, I decided to do a little performance comparison, and the results surprised me. I hope someone can find a bug in my Hibernate example or point to some relevant information for tuning my code.

The test uses 2 tables: a Basket table and a BasketItem table. One Basket can have many items (similar to the Blog/BlogItem example in the documentation).

In my test I have 100 Baskets with 1000 BasketItems in each Basket - a total of 100.000 rows.

I iterate through all Basket and BasketItem objects and measure the time used using 3 different methods:

Method 1: Using plain JDBC with the following pseudocode:

Code:

Baskets = select * from Basket
foreach basket in Baskets {
    Items = select * from BasketItem where basketId=b.id
    foreach item in Items {
         doSometingWithItem
    }
}

This results in 101 select statements and it uses about 8 seconds to finish. CPU and memory usage is moderate (measured by watching the windows task manager).

Method 2:
Use OJB and

Code:

Query query = new QueryByCriteria(Basket.class, null);
Iterator i = broker.getIteratorByQuery(query);

foreach i: 
    foreach i.basketItems: doSomeThingWithItem()

This uses about 13 seconds to finish. Memory use is about 3 times (120Mb) as much as the plain JDBC code and CPU usage is slightly higher - this is still acceptable.

Method 3:
Use Hibernate... BTW: I'm using Hibernate 2.1.
I've tried different setups here, all with miserable results...

I'm using a List as the 1-m Collection. and I've tried the following:

Use outer-join and fetch: This was a totalt disaster (no big surprise as this is not encouraged in the Best Practices section in the documentation). This used about 120 seconds to finish and used about 540 Mb memory and 100% CPU... The problem with fetch is that it's not possible to get an iterator - I have to read all objects to memory first (at least I've not been able to get an iterator for fetch queries without reading it all into memory).

Second try: Use session.iterate("From BASKET in class Basket").
This is much better, but it still uses about 22 seconds to finish and worse: it uses about 500Mb memory and 90%CPU...

I've experimented with different settings for lazy, cascade and batch-size parameters in the mapping file, but I've got no significant performance improvements with different settings.

Anyone who has a tip to improve performance? Iterating over large 1-m collections must be a common usage???

michael · **Posted:** Wed Dec 17, 2003 10:47 am

How shall we say anything about this if you neither post mappings, code, nor anything else about what you did with hibernate?

tengvig · **Joined:** Wed Dec 17, 2003 10:01 am **Posts:** 4

gloeglm wrote:

How shall we say anything about this if you neither post mappings, code, nor anything else about what you did with hibernate?

Good point :)

Here is the Oracle tables:

Code:

Table EVAL_BASKET:

   ID INTEGER NOT NULL PRIMARY KEY
   OPCO_NAME VARCHAR(60) NOT NULL
   SHOPPING_DATE DATE NOT NULL


Table EVAL_BASKET_ITEM:

   ID INTEGER NOT NULL PRIMARY KEY
   ITEM_NAME VARCHAR(120) NOT NULL
   PRICE NUMBER(12,2) NOT NULL
   CATEGORY INTEGER NOT NULL
   BASKET_ID INTEGER NOT NULL

Here is the Hibernate mapping:

Code:

<hibernate-mapping>
    <class name="ojbeval.Basket" table="EVAL_BASKET">
        <id name="id" type="int" unsaved-value="null" >
            <column name="ID" sql-type="INTEGER" not-null="true"/>
            <generator class="increment"/>
        </id>

        <property name="opcoName">
            <column name="OPCO_NAME" sql-type="varchar(60)" 
                          not-null="true"/>
        </property>
        
        <property name="shoppingDate">
            <column name="SHOPPING_DATE" sql-type="DATE_TIME" 
                          not-null="true"/>
        </property>

        <set 
            name="items" 
            inverse="true" 
            lazy="false"
            batch-size="1000"
            cascade="all">
            
            <key column="BASKET_ID"/>
            <one-to-many class="ojbeval.BasketItem"/>
        </set>
        
    </class>


    <!-- Basket item -->
    <class name="ojbeval.BasketItem" table="EVAL_BASKET_ITEM">
        <id name="id" type="int" unsaved-value="null" >
            <column name="ID" sql-type="INTEGER" not-null="true"/>
            <generator class="increment"/>
        </id>

        <property name="itemName">
            <column name="ITEM_NAME" sql-type="varchar(120)" 
                         not-null="true"/>
        </property>
        
        <property name="price">
            <column name="PRICE" sql-type="DECIMAL" not-null="true"/>
        </property>
        
        <property name="itemCategory">
            <column name="CATEGORY" sql-type="INTEGER" 
                         not-null="true"/>
        </property>
        
        <many-to-one 
            name="basket"
            column="BASKET_ID" 
            not-null="true"/>
    </class>
</hibernate-mapping>

Basket class:

Code:

public class Basket implements Serializable {
   private int id;
   private String opcoName;
   private Date shoppingDate;
   private Set items;
   
   public Basket() {
   }
(getters and setters omitted)

BasketItem class:

Code:

public class BasketItem implements Serializable {
   private int id;
   private String itemName;
   private BigDecimal price;
   private int itemCategory;
   
   private int basketId;
   private Basket basket;
   
   public BasketItem() {
   }
(getters and setters omitted)

I've tried to iterate by using
session.iterate("from EVAL_BASKET in class Basket") and

Code:

Query q = session.createQuery("from Basket as b " +
                                              "left outer join fetch b.HItems ");
i = q.list().iterator();

Where the first attempt has been the most successful one.

gavin · **Posted:** Wed Dec 17, 2003 11:14 am

This is an absurd usecase. Applications should not load 100 000 rows from a database in a single transaction. What do you plan to do with them? Display 100 000 rows to the user?

Of course you will see low performance if you fill the session cache up with 100 000 objects.

Nevertheless, if you want behavior that is more like your direct JDBC

(1) make sure you use lazy fetching for the collection!
(2) use a find("from Basket"), not iterate(), no fetch
(3) evict the items from the session cache as you finish processing them!

Most of all, use the Hibernate log to understand exactly what you are doing in terms of what SQL is being executed. Neither of your attempts so far actually performs the same SQL as you JDBC code!

michael · **Posted:** Wed Dec 17, 2003 11:19 am

Set the collection to lazy = "true".

Load the Baskets with find.

Iterate over the baskets and for every basket call getItems().iterator() and iterate over them.

As Gavin Said, evict the Session cache with session.evict(item) every time

tengvig · **Joined:** Wed Dec 17, 2003 10:01 am **Posts:** 4

gloeglm wrote:

Set the collection to lazy = "true".

Load the Baskets with find.

Iterate over the baskets and for every basket call getItems().iterator() and iterate over them.

As Gavin Said, evict the Session cache with session.evict(item) every time

Just tried that. This resulted in much better memory usage: about 200Mb. Time usage increased from about 22 seconds to about 36 seconds.

gavin · **Posted:** Wed Dec 17, 2003 11:29 am

Well, you are obviously doing much else wrong then.

How about you spend some time learning Hibernate instead of running absurd performance tests - then you will be able to understand how to write your absurd test correctly.

Just a suggestion.

gavin · **Posted:** Wed Dec 17, 2003 11:33 am

Oh LOL.

For example, understanding what batch-size does would help.

Code:

<set
            name="items"
            inverse="true"
            lazy="false"
            batch-size="1000"
            cascade="all"> 

Remove the batch-size attribute for a start, or set it to some sensible number as recommended in the documentation.

geirhe · **Posted:** Wed Dec 17, 2003 3:13 pm

Gavin: I don't much care for your tone.

Some of us have to use the data we store for statistical purposes as well as one-row accesses. This means that accessing several hundred thousand rows is _not_ an "absurd usecase". In size at least, the dataset is small. It would be a good design practice to use the same persistence mechanism for any statistical calculations as for actually storing the rows.

The test done by tengvig is relevant to his work, and full of errors done by a novice. That is why he asked - he knows he is a novice. He doesn't see why the performance should be so bad. I am going to be in the same position in a couple of weeks.

Telling him to "spend some time learning hibernate" and then answering your own message with something that tells tengvig that he doesn't understand Hibernate is extremely unhelpful and very arrogant. tengvig knows he doesn't know Hibernate. He is using something else, and looking to learn Hibernate to see if it is a viable replacement.

I am very sorry I asked tengvig to have a look at Hibernate. It might have a better feature set, but I have never seen anyone being treated so poorly because they are asking a question.

Gavin: If you can't cope with novices, it might be better if you go sit quietly in a corner and do something you can cope with.

christian · **Posted:** Wed Dec 17, 2003 3:26 pm

His usecase is not suited for any ORM software, you just don't load 100.000 objects into memory. Use a stored procedure and implement the logic in the database. This is not a realistic behavior for a general purpose information system, but a very specific reporting/aggregation function. Use the right tool for the job, that would be the first step. The original poster didn't.

Performance testing is of course critical, but what happens here is premature optimization. You don't have to test something like "load simple objects and see how fast Hibernate can mangle ResultSets". We did that for you and say so whenever we can. The Hibernate forum is full of discussions about Hibernate performance. The original poster should have searched, but didn't.

Performance testing is not something you do in half an our. The test of the original poster is flawed in so many ways that it is not possible to explain the errors in detail. The simple rule is: Learn the tool first, then optimize. It is likely that a software with many thousand users is already optimized for good performance, even by default.

Finally, don't complain if we can't take you by the hand and walk you through the basics. There is plenty of documentation, obviously the original poster has not even read the reference documentation and any chapter or section about "performance".

The deal is simple: You try to educate yourself as good as possible and we help you in our free time if you are stuck.

EOD

gavin · **Posted:** Wed Dec 17, 2003 5:12 pm

Quote:

Gavin: I don't much care for your tone.

Quote:

extremely unhelpful and very arrogant

eh??

Well ... I'm reading back over ... I didn't actually see that my tone was that bad .... pretty good for me at 2.30 AM - I don't actually know what I said that was rude .... perhaps a misinterpretation ?

I dunno. I was actually just trying to give some practical advice:

* don't test absurd usecases (this is one, as Christian correctly argues)
* learn the tool properly before performance testing it (this means reading the documentation properly)
* actually look at the SQL that is being run!

There were a bunch of things wrong with the code above, and time spent experimenting and reading the documentation was what was called for. Is it wrong to tell people to do what they really need to do?

Quote:

Gavin: If you can't cope with novices, it might be better if you go sit quietly in a corner and do something you can cope with.

Damn!! I thought I was helping.....

Oh well. This "relationships with other humans" thing was never my strong point ... better get back to coding....

tengvig · **Joined:** Wed Dec 17, 2003 10:01 am **Posts:** 4

Christian: I don't think you understood the original question, or maybe I didn't explain it clear enough. I don't want to load 100.000 objects into memory. Basically what I have is a large 1-m collection, and I want to iterate through it and do something to each object. What I'm doing with the objects isn't easy to accomplish with a stored procedure, so I have to do it with java.

I agree that this isn't necessary the most common thing to do in a general purpose information system, but its not that unusual either!

As for searching the Hibernate forums, of course I did that! There are a lot of postings about performance there, but if you search for performance AND collections there isn't many hits - and none of them were relevant for my case. As for reading the documentation: of course I have! I have not studied it thorougly, but enough to run different types of tests on my use-case.

I see that you point to flaws in many ways in my code, but it would be nice to point out just a few of them instead of saying that they exists. The only constructive suggestion was to evict objects (from gloeglm) - I tried it and it helped, but not enough. I already stated in the original post that I've tried different settings for batch-size without any significant improvement, so laughing out loud at the particular setting that I posted doesn't help much.

As for performance testing in general: This was never intended to be a generic performance test of Hibernate. I've not bothered to test all the normal stuff - I take it for granted that Hibernate performs well in those cases. I wanted to test special cases that I need in my application to see if Hibernate performs well with those - which it didn't.

I'm not out for some bashing of Hibernate, I want to see if it's usable for my case and any help with that would've been nice.

So far, most of what I've seen is: your use case is stupid and your code is full of errors - besides this kind of problem isn't suited for ORM tools.

What I read is: Hibernate isn't suited for iterating over large 1-m collections - use some other tool for that. If that's the case why don't just say it? If it's not the case I'd love to get some pointers in the right direction!

gavin · **Posted:** Wed Dec 17, 2003 6:12 pm

Quote:

I don't want to load 100.000 objects into memory. Basically what I have is a large 1-m collection, and I want to iterate through it and do something to each object.

ie. You want to load 100 000 rows. This is always going to be slow in Java. JDBC was taking you 13 seconds! I bet you would see an order-of-magnitude improvement my using a stored procedure.

Quote:

The only constructive suggestion was to evict objects (from gloeglm)

I'm sorry? My (1), (2) and (3) were not constructive?

Quote:

I see that you point to flaws in many ways in my code, but it would be nice to point out just a few of them instead of saying that they exists

So far I told you four things that were wrong with your test. I have no idea how many other things you have wrong.

Quote:

I already stated in the original post that I've tried different settings for batch-size without any significant improvement, so laughing out loud at the particular setting that I posted doesn't help much.

Did you look at the generated SQL?? And see how absurd it looks with batch-size="1000"? You would laugh too!

Quote:

What I read is: Hibernate isn't suited for iterating over large 1-m collections - use some other tool for that. If that's the case why don't just say it?!

Hibernate is not suitable for iterating over 100 000 objects in a transaction. I said that in the first post. Nor is any other Java persistence framework, or direct JDBC. This kind of problem should not be handled in Java. That is my strong view.

Of course, with correct code, I am quite confident that Hibernate will perform as fast or faster than OJB. (I have performance tested against OJB on a number of occasions and we always seem to win comfortably.)

But it is your responsibity to learn to use Hibernate for yourself. We do not do handholding here. This means: knuckle down and read the documentation, learn how to enable SQL logging, experiment with different things, learn the effects of the different things.

We have tens of thousands of users and simply don't have time to re-explain the basics to each person. We put an enormous amount of work into the documentation for this reason. Please use it.

TIA.

michael · **Posted:** Wed Dec 17, 2003 6:27 pm

Hi,

I have just redone the test the original poster described here, using exactly the setup Gavin and me suggested. This took exactly 20 Seconds using Hibernate and FirebirdSQL, does 101 SQL Selects and I still have not done more than 2 Minutes on Work on it. There is definately an error in your test

michael · **Posted:** Wed Dec 17, 2003 6:41 pm

Evicting on every loop and adding an index to the parent id column has reduced the runtime to 4 seconds