Loading deep object graphs

exortech · **Joined:** Tue Nov 13, 2007 7:16 pm **Posts:** 5

We have a problem where we need to load a deep object graph. Lazy loading produces thousand of individual sql statements, whereas eager fetch generates a cartesian product containing millions of rows. Both approaches are infeasible as they create serious performance problems. Really what we need is one select per table which will correctly hydrate the object graph.

What's the best approach for dealing with this problem?

The problem is not dissimilar to Mats Helander ORM challenge:
http://www.matshelander.com/wordpress/?p=55

The main difference is that we need to load the data for a single customer/user -- loading the data for all customers is infeasible as there are 100,000 customers in the database.

Another constraint is that we don't have bidirectional references in our domain model (ie. from Mats example, you can't get from an OrderLine back to an Order).

To be specific, our domain looks like this:
- A user has multiple contact lists
- Each contact list contains multiple contacts
- Each contact has multiple addresses, email addresses, telephone numbers, etc.

For a single user, we need to eagerly load all contact lists along with the all of the nested child entities. It is a simple matter to compose separate SQL statements for each table joining in the ids for the parent objects. What I'm not clear on is how to coax NHibernate to do this.

I've looked at MultiQuery, but I'm not clear on how to use it to solve this situation? Can anyone recommend the best approach? Our current solution is to bypass nHibernate entirely and write some mappers to wire up the object graph by hand. Hardly ideal.

We're using NHibernate 1.2 GA.

merge_s.rottem · **Joined:** Tue Aug 23, 2005 5:52 am **Posts:** 335

Look at the JOIN FETCH feature of HQL and FetchMode in the Criteria API.

Symon.

exortech · **Joined:** Tue Nov 13, 2007 7:16 pm **Posts:** 5

Thanks for your suggestion, but using JOIN FETCH will produce the massive cartesian product that I mentioned previously. We considered this option but discarded it because of the terrible performance it produces. Really, as far as I'm aware, JOIN FETCH is really only an option for joining a small number of small tables. When the resultset is potentially more than 100 rows from any joined table, the cartesian product quickly starts to get out of hand.

I'm looking for another option, which is why I was looking at multi-query. But as far as I can determine, NHibernate just can't handle this fairly simple and common problem.

_carl_ · **Posted:** Thu Nov 15, 2007 12:10 pm

Unfortunately, I don't think it is possible in NHibernate currently.. I would really need this feature as well. Does anyone know how (and if) this is done in the java Hibernate?

paulh · **Posted:** Fri Nov 16, 2007 4:43 am

Consider some form of hybrid strategy, i.e. do a certain amount of joins so that the initial data load is not too massive and then leave the rest of it lazy so that individual selects occur. What the balance would be for this is a matter of experimentation but you can roughly calculate the size of the cartersian product + no of selects if you assume some rough ratios between each of the tables e.g. customer -> order -> orderline is 1 -> 100 -> 10 which meands each customer retreive would get 10K rows.

The other thing to consider is slightly more lateral - why do you need all of this data at the same time? Is there a way of refactoring the business process so that you can concentrate on one bit of the object graph at a time so that you minimize the amount of data in memory.

If you can't, then it might be that the problem you are solving would be better addressed with something like a data warehouse rather than loading individual records into memory.

Regards

_carl_ · **Posted:** Fri Nov 16, 2007 11:01 am

Paul,

this is exactly the way I'm doing it right now.
Still it would be nice to have the ability to just fetch the ID of the parent along with the wanted child collection.

An example in a multiquery I have in my application I do the following queries:

"from Kpi k left join fetch k.Columns where k in (:kpis)"
"from Kpi k left join fetch k.Comments in k in (:kpis)"
"from Kpi k left join fetch k.Formulas in k in (:kpis)"

Where "kpis" are a list of kpis that have already been fetched.
In this case it fetches a lot of unnecessary data since Kpi contains many columns (20) which have already been fetched. The only thing it actually needs from the Kpi is the ID.

Is this problem solved in javas Hibernate? If so, would it be hard to add this features?

exortech · **Joined:** Tue Nov 13, 2007 7:16 pm **Posts:** 5

Paul,
Thanks for your response and for confirming that this is not supported by NHibernate. Your suggestion of doing an initial load using joins and then lazy loading the rest bears consideration, but won't work in our situation. The majority of the data is at the bottom of the tree, in the child tables, and it would result in way too many selects. Unlike Mats' ORM challenge, we have multiple child tables (email addresses, telephone numbers, postal addresses, etc) which need to be loaded. Also, reducing the volume of data loaded is also not an option.

The only feasible, performant option that I can see is to bypass NHibernate and to compose the SQL that we need. What's the recommended approach for doing this? It would be nice to be able to load our objects into session and leverage the mapping that is already specified in the hbm files rather than handcoding our own mappers. But I get the impression that this won't be easy. Is there a way to do this?

Basically what I'm looking for is the flexibility of iBatis with the CRUD support (plus other stuff) provided by nHibernate. Any recommendations?

exortech · **Joined:** Tue Nov 13, 2007 7:16 pm **Posts:** 5

Carl,
With your multiquery, in addition to the extra columns, don't you also end up with some pretty non-performant SQL as I assume that your :kpis collection could potentially have a lot of values in it. This would make for one nasty 'select ... where in (long list of kpis)'.

Tolomaüs · **Joined:** Sat Jul 21, 2007 3:56 pm **Posts:** 27

Exortech,

Can't you use "subselect" fetchmode?

(See chapter 15.1 from the documentation)

_carl_ wrote:

An example in a multiquery I have in my application I do the following queries:

"from Kpi k left join fetch k.Columns where k in (:kpis)"
"from Kpi k left join fetch k.Comments in k in (:kpis)"
"from Kpi k left join fetch k.Formulas in k in (:kpis)"

Where "kpis" are a list of kpis that have already been fetched.
In this case it fetches a lot of unnecessary data since Kpi contains many columns (20) which have already been fetched. The only thing it actually needs from the Kpi is the ID.

Carl,

"select k.ID from Kpi k left join fetch k.Columns where k in (:kpis)" should solve your problem, assuming the Kpi's were already loaded before.

exortech · **Joined:** Tue Nov 13, 2007 7:16 pm **Posts:** 5

Can't you use "subselect" fetchmode?

Subselect fetch mode won't work because it still results in n+1 selects. In Mats' ORM challenge example of loading all orders and their order lines for all customers, subselect fetchmode would still result in 1 SQL query to load the orders for each customer and 1 SQL query to load the orderlines for each order. Assuming 100 customers and 100 Orders, this would result in:
- 1 SQL to load customers
- 100 SQLs to load the orders for each customer
- 100 SQLs to load the orderlines for each order
giving 201 SQL queries (and this is for a small database).

So, the question stands: if NHibernate can't generate the appropriate SQL, what's the best way to leverage NHibernate to build an object graph using the SQL I supply?

jchapman · **Joined:** Thu Feb 09, 2006 1:30 pm **Posts:** 172

exortech wrote:

Subselect fetch mode won't work because it still results in n+1 selects. In Mats' ORM challenge example of loading all orders and their order lines for all customers, subselect fetchmode would still result in 1 SQL query to load the orders for each customer and 1 SQL query to load the orderlines for each order. Assuming 100 customers and 100 Orders, this would result in:
- 1 SQL to load customers
- 100 SQLs to load the orders for each customer
- 100 SQLs to load the orderlines for each order
giving 201 SQL queries (and this is for a small database).

I would actually like to see the performance difference from this. Are you sure it is actually a noticeable difference to use 100 SQL statements to load the orders for each customer as opposed to loading them with 1 statement which says where ID IN (<long list>)?

Lets think about this for a second. If you are claiming NHibernate makes a separate connection for each of the query statements (which I do not believe it does) then the performance will seriously degrade. However, if there are many statements made as part of a single connection and statement, would it really make much of a difference? I realize more bandwidth is needed to send the SQL, but that shouldn't be too significant.

Think about how SQL Server works. If you give it a list of 100 items with an IN statement, the query is either going to perform 100 index seeks to find exactly the items it needs, or it will scan records filtering the records where the values do not match the IN list. The database will determine the optimum mechanism to select the data.

If it performs 100 separate seeks then it is effectively the same thing as running 100 SQL statements. There might be some added overhead of making entirely new statements, but is that overhead really noticeable?

I would actually be curious in the results of this. Obviously you would need a sizable database to actually run a valid test. This really only applies when we are pulling data for hundreds of customers where there are millions of orders.

Tolomaüs · **Joined:** Sat Jul 21, 2007 3:56 pm **Posts:** 27

exortech wrote:

Subselect fetch mode won't work because it still results in n+1 selects.

exortech,

No more than three SQL queries would be necessary:

One query for retrieving a subset of the customers:

Code:

select ... from customer where <your conditions>

One query for retrieving all orders of the retrieved customers:

Code:

select ... from order where order.customerid in (select customerid from customer where <your conditions>)

And one for retrieving the orderlines:

Code:

select ... from orderline where orderline.orderid in (select orderid from order where order.customerid in (select customerid from customer where <your conditions>))

Of course, it could be that performance-wise this is not the best solution in your case, but at least it's possible.

_carl_ · **Posted:** Mon Nov 19, 2007 10:54 am

Tolomaüs wrote:

Carl,

"select k.ID from Kpi k left join fetch k.Columns where k in (:kpis)" should solve your problem, assuming the Kpi's were already loaded before.

Yes, that do work most of the time. However, for those Kpi that doesn't have any Columns the collection is not initialized and a separate SQL statement is made when accessing that collection. So that (almost) puts me back to n+1 problem again. Is this intended to work this way or a bug?

Tolomaüs · **Joined:** Sat Jul 21, 2007 3:56 pm **Posts:** 27

Carl,

Indeed, I have had the same behaviour before. I looked into the code and for me it seems like a bug. Nhibernate should initialize an empty collection in this case, instead of not initializing it at all.

_carl_ · **Posted:** Mon Dec 03, 2007 11:05 am

Tolomaüs wrote:

Carl,

Indeed, I have had the same behaviour before. I looked into the code and for me it seems like a bug. Nhibernate should initialize an empty collection in this case, instead of not initializing it at all.

I've created a jira for this: NH-1211

Please vote for it.