-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 7 posts ] 
Author Message
 Post subject: Indexing a large object graph
PostPosted: Wed Nov 10, 2010 10:33 am 
Newbie

Joined: Tue Nov 09, 2010 4:42 pm
Posts: 7
Hi there,

I have an issue with trying to index a very large object graph. With a limited test data set, everything works fine. Moving to live data, however, indexing is very slow and quickly runs into a heap space error.

Essentially, the problem is this:
I have a person entity and a cage entity. A person has a set of cages, which is marked @ContainedIn. A cage likewise has a person who it was created by, which is @IndexedEmbedded. There are a number of other entities involved, but I think that situation is demonstrative of the problem. There are person objects who have many thousand cages which they have created. And each cage has a number of @IndexedEmbedded objects in addition.

When I call session.index(aPerson), hibernate search seems to follow up through the @ContainedIn and index all those entities as well. That is, without calling session.index(aCage), I end up with thousands of cages indexed. This makes indexing each person take an exceptionally long time, and does a lot of redundant work because of the numerous repeated objects that are getting indexed. Furthermore, loading all those entities into memory eventually causes a heap overflow on the more well-connected Person objects, even when calling session.flushToIndexes() and session.clear() after each Person is indexed.

I've tried indexing the Cage objects first, in the hopes that it wouldn't feel the need to re-index them again when doing the Person objects, but that hasn't helped.

If it was simply a matter of indexing taking a long time, that'd be one concern. But I can't index even once, since the heap overflows.

Is there some way to prevent this behaviour? I tried disabling automatic indexing, but that didn't make a difference.


Top
 Profile  
 
 Post subject: Re: Indexing a large object graph
PostPosted: Thu Nov 11, 2010 5:35 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Are you using the depth parameter of @IndexedEmbedded. Using it should allow you to control how deep you are traversing the object graph.


Top
 Profile  
 
 Post subject: Re: Indexing a large object graph
PostPosted: Thu Nov 11, 2010 11:10 am 
Newbie

Joined: Tue Nov 09, 2010 4:42 pm
Posts: 7
Yes, I use the depth parameter on the IndexedEmbedded fields. However, the issue doesn't seem to be with those entities, but rather with the ContainedIn collections. That is, if I remove all the IndexedEmbedded annotations from Person, but leave the ContainedIn annotation on person.getCages(), when I call session.index(aPerson), it will ALSO index all objects in the person's cages collection. I'm not sure why exactly that happens.

I do have lazy fetching enabled on the person.getCages() collection, so it fetches the person's set of cages each time it indexes a person. Trying to fetch everything in the initial query, with numerous join fetches, causes the DBMS to run out of table space trying to create the result set.


Top
 Profile  
 
 Post subject: Re: Indexing a large object graph
PostPosted: Thu Nov 11, 2010 11:30 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Probably best you post your annotated classes.


Top
 Profile  
 
 Post subject: Re: Indexing a large object graph
PostPosted: Thu Nov 11, 2010 11:48 am 
Newbie

Joined: Tue Nov 09, 2010 4:42 pm
Posts: 7
Okay, here's the majority of the relevant code. There's obviously a lot more, but this at least shows the relationship between the two classes. I've cut out the rest of the corresponding ManyToOnes in Cage, but they're essentially identical. Let me know if you need more than this and I'll pastebin the whole classes. The ContainedIn annotations are currently commented out to allow indexing to run at all. If I uncomment those, that's when the problem occurs.

Code:
@Entity
@Indexed
@Table(name = "PERSON")
public class Person implements java.io.Serializable, Selectable {
<snip>
   @OneToMany(fetch = FetchType.LAZY, mappedBy = "personByBarcodeCreatorKey")
   //@ContainedIn
   public Set<Cage> getCagesForBarcodeCreatorKey() {
      return this.cagesForBarcodeCreatorKey;
   }
   public void setCagesForBarcodeCreatorKey(Set<Cage> cagesForBarcodeCreatorKey) {
      this.cagesForBarcodeCreatorKey = cagesForBarcodeCreatorKey;
   }

   @OneToMany(fetch = FetchType.LAZY, mappedBy = "personByTerminatedBy")
   //@ContainedIn
   public Set<Cage> getCagesForTerminatedBy() {
      return this.cagesForTerminatedBy;
   }
   public void setCagesForTerminatedBy(Set<Cage> cagesForTerminatedBy) {
      this.cagesForTerminatedBy = cagesForTerminatedBy;
   }
   
   @OneToMany(fetch = FetchType.LAZY, mappedBy = "personByCageOwnerKey")
   @OrderBy("barcodeValue desc")
   //@ContainedIn
   public Set<Cage> getCagesForCageOwnerKey() {
      return this.cagesForCageOwnerKey;
   }
   public void setCagesForCageOwnerKey(Set<Cage> cagesForCageOwnerKey) {
      this.cagesForCageOwnerKey = cagesForCageOwnerKey;
   }

   @OneToMany(fetch = FetchType.LAZY, mappedBy = "personByModifiedBy")
   //@ContainedIn
   public Set<Cage> getCagesForModifiedBy() {
      return this.cagesForModifiedBy;
   }
   public void setCagesForModifiedBy(Set<Cage> cagesForModifiedBy) {
      this.cagesForModifiedBy = cagesForModifiedBy;
   }

   @OneToMany(fetch = FetchType.LAZY, mappedBy = "personByFinancedBy")
   //@ContainedIn
   public Set<Cage> getCagesForFinancedBy() {
      return this.cagesForFinancedBy;
   }
   public void setCagesForFinancedBy(Set<Cage> cagesForFinancedBy) {
      this.cagesForFinancedBy = cagesForFinancedBy;
   }
<snip>
}


Code:
@Entity
@Indexed
@Table(name = "CAGE", uniqueConstraints = @UniqueConstraint(columnNames = "BARCODE_VALUE"))
public class Cage implements java.io.Serializable, Selectable {
<snip>
   @ManyToOne(fetch = FetchType.LAZY)
   @JoinColumn(name = "BARCODE_CREATOR_KEY", nullable = false)
   @NotNull
   @IndexedEmbedded(depth = 1)
   public Person getPersonByBarcodeCreatorKey() {
      return this.personByBarcodeCreatorKey;
   }
   public void setPersonByBarcodeCreatorKey(Person personByBarcodeCreatorKey) {
      this.personByBarcodeCreatorKey = personByBarcodeCreatorKey;
   }
</snip>
}


Top
 Profile  
 
 Post subject: Re: Indexing a large object graph
PostPosted: Tue Nov 16, 2010 12:19 pm 
Newbie

Joined: Tue Nov 09, 2010 4:42 pm
Posts: 7
Anyone have any ideas?


Top
 Profile  
 
 Post subject: Re: Indexing a large object graph
PostPosted: Tue Nov 16, 2010 12:45 pm 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
There is really a lot going on in your code and I think it would be good if you could post the complete code for the involved classes.
Using @ContainedIn can become a quite expensive operation. Each time a person entity changes all the Lucene documents for the cages in all these different sets have to be updated. Lazy fetching won't help much, because in the end all the cages need to be retrieved anyways.
How do your searches look like? Do you search for person and cages? Do you really need the @ContainedIn functionality?
Have you inspected the log whether there is something unusual there?

--Hardy


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 7 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.