-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 3 posts ] 
Author Message
 Post subject: Any Statistics on Hibernate Search Indexing ?
PostPosted: Thu Feb 21, 2008 5:49 pm 
Newbie

Joined: Wed Feb 20, 2008 6:42 pm
Posts: 14
Hi,

Can any body has any numbers on the speed of the indexing process using HSearch ? I have the indexing process running on an IBM blade and my numbers are as below:

Records : 10 million+
Time taken : 18 days.

My indexable entity is relatively big, but the indexed fields are just 5. This bean has 8 associated objects and I am NOT indexing them.

Any help is highly appreciated.

[Update: http://www.hibernate.org/15.html,
Many people try to benchmark Hibernate. All public benchmarks we have seen so far had (and most still have) serious flaws.

it seems that, I've asked a pretty stupid question !!! ]

--Best Regards
Anil.[/b]


Top
 Profile  
 
 Post subject:
PostPosted: Sun Feb 24, 2008 8:26 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi,
I don't consider your question as stupid, having to reindex all data is a real world problem.
I've had a similar problem:

Records: 4 Million
Time taken: 3 months (estimate after 3 days)

Mine was probably slower than yours because a "single" record is actually a very complex entity, with a dozen of many-to-many objects going in the same index.

We solved this problem with some tweaking and recoding:
1)correct lazy/eager fields on the entities
2)caching for entities which where referred to very often (look-up tables and similar)
3)More threads: one reading PKs, 30 read the entities, 10 convert from entities to Lucene Documents, one writes to the index (and each step has a 1000 elements queue connecting to the next)
(The number of threads really depends on the machine, database, network performance and cache configuration)
4)Some custom-build StringBridges for the more complex fields
4)Tweaking the indexwriter parameters (Lucene 2.3) to merge lots of documents in the index at the same time (not one by one).

Time taken: 20 minutes.

I just finished updating my code to H.Search 3.0.1 and will share it as soon as it is "cleaned up" and testable; Unfortunately I'll have to do it in my hobby time, so my estimate is to release some alpha code in half march.

My current code uses the inner-API of Hibernate Search, the idea is to submit a patch, so I have to change some design.
I could use some help in benchmark and testing setup to verify it all, I'm studying the test s in Hibernate Search now.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Feb 25, 2008 11:10 am 
Hibernate Team
Hibernate Team

Joined: Sun Sep 14, 2003 3:54 am
Posts: 7256
Location: Paris, France
Generally speaking when indexing millions of records (in the tens) takes half a day or more, something is very likely to be wrong and generally it's due to an improper fetching strategy (ie. how you read data from the database).

Disclaimer, time heavily depends on the complexity of the dataset.

_________________
Emmanuel


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 3 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.