-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 3 posts ] 
Author Message
 Post subject: Hibernate Search: Eliminating redundancy?
PostPosted: Tue Feb 09, 2010 11:19 pm 
Regular
Regular

Joined: Wed Dec 17, 2003 1:58 pm
Posts: 102
Hi all,
I index a large number of documents, nearly 800k of one specific class right now. My index is taking up 85M and I'd love to get this down if possible. There is a lot of redundancy in the data being indexed per document, ie each object being indexed has a many-to-one relationship to a child object, and there are only a few thousand of these child objects, but most of the data in each document is actually held in the child object. So my question is: is it possible to have in effect a many-to-one relationship in indexing somehow, to help eliminate this redundancy?


Top
 Profile  
 
 Post subject: Re: Hibernate Search: Eliminating redundancy?
PostPosted: Wed Feb 10, 2010 5:58 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi,
the index world is not relational, so the quick answer is no.
However you shouldn't worry much about it because the content is not stored in the index, the strings are tokenized and each token is stored as a term. These terms are unique: so if each of your documents where written by the same author named "alcyon", "alcyon" is not going to take more disk space if it's repeated one of 1000 times. there are just some additional "pointers" from the document to the relevant term, but you'd have the same stuff in a relational world.

So in short even if that would be possible, it wouldn't make your index noticeabily smaller.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Hibernate Search: Eliminating redundancy?
PostPosted: Wed Feb 10, 2010 6:15 am 
Regular
Regular

Joined: Wed Dec 17, 2003 1:58 pm
Posts: 102
Ah gotcha, good to know. Thanks!


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 3 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.