Hi,
The nub of this post:
can I *dynamically* control/override when subselect is used when eagerly fetching collections? Right now, I believe that:
1. If lazy=true, fetch=subselect is hardcoded in the hbm for collections, then there is no way to turn off eager fetching via subselect for any query (either critieria or hql-based). This causes performance of queries that don't need to fetch the collections to degrade.
2. If the hbm does NOT specify lazy=true, fetch=subselect for the collections, then there is no way to ever get fetch=subselect behavior for any query (either critieria or hql-based). This causes performce of queries that need to efficiently fetch the collections to degrade (eager joins suffer from cartesian explosion).
Is my understanding correct?
Details: As an example, I have a hierarchical class 'Node' that has three collections (see Node hbm snippet at bottom), and must support two distinct use cases:
1. Deep Narrow Query: Grab all nodes for a given day (runDay=1/1/2010).
Workflow accesses all collections, so need to efficiently fetch the Node collections (childNodes/events/values). Sample HQL: from Node n where n.runDay = '1/1/2010'
2. Shallow Wide Query: Gather all 'root' nodes for a given week (1/1/2010 <= runDay < 1/8/2010), where 'root' is defined as node that has no parent. In this case,
none of the collections are accessed as part of the workflow, so they don't need to be fetched. Sample HQL: from Node n where n.parentNode is null and '1/1/2010' <= runDay and runDay < '1/8/2010'
Performance:
Code:
HBM
Lazy Fetch DeepNarrow ShallowWide
false subselect 11 sec 34 sec
true - (select) 71 sec* 6 sec // 3N+1 Select on lazy fetches
true join 600 sec** 6 sec // cartesian explosion on eager join
* - lazy query, but as access all collections in the DeepNarrow result, results in 3N+1 selects overall.
** - via hql, I overrode lazy=true with eager fetch of all collections, resulting in cartesian explosion (roughly 10*10*10=1000 fold explosion) and horrible performance.
Thus the ideal solution is to use lazy=false, fetch=subselect strategy for the Deep Narrow query (11 sec), and lazy=true for the Shallow Wide query (6 sec).
But there is no way to do this?Note that if there is not, then I have essentially have to choose one of the strategies (probably lazy=false, fetch=subselect), and live with the at least 5-fold degradation (34 sec compared to 6 secs for the Shallow Wide query). Ouch!
And if I can't control subselect dynamically, I'm open to other suggestions as to how to get acceptable performance for *both* of the above workflows?
Thanks,
Brad
---------------------------------------------------------------------
I have a data structure that looks like (ignore lazy/fetch attribs for now):
Code:
<class name="Node" table="Node" lazy="false">
<id name="nodeId" column="NodeId" type="integer" unsaved-value="0">
<generator class="native"/>
</id>
<property name="runDay" column="RunDay" type="Calendar" not-null="true"/>
<!-- Many properties and many-to-one and one-to-one associations have been removed to make this example succinct -->
<many-to-one name="parentNode" class="Node" column="ParentNodeId" cascade="none" not-null="false"
index="IDX_PARENT_NODE_ID" lazy="false" fetch="join" />
<!-- 3 collections -->
<bag name="childNodes" inverse="true" cascade="none" lazy="false" fetch="subselect">
<key column="ParentNodeId"/>
<one-to-many class="Node"/>
</bag>
<set name="events" table="NodeEvent" cascade="all-delete-orphan" lazy="false" fetch="subselect">
<key column="NodeId" not-null="true"/>
<one-to-many class="NodeEvent"/>
</set>
<map name="values" table="NodePropertyValue" cascade="all-delete-orphan" lazy="false" fetch="subselect">
<key column="NodeId"/>
<index-many-to-many column="PropertyTypeId" class="NodeType"/>
<element column="PropertyValue" type="text"/>
</map>
</class>