-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 20 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: hibernate search batch indexing slow.
PostPosted: Wed Aug 20, 2008 9:43 am 
Regular
Regular

Joined: Tue Apr 01, 2008 5:39 pm
Posts: 61
Need help with Hibernate? Read this first:
http://www.hibernate.org/ForumMailingli ... AskForHelp

Hibernate Version
3.30 GA
hiberante search 3.1.0 Beta 1

Mapping documents:
Code:
class Post
{
    @DocumentId
    private Long id;
   
    @IndexedEmbedded(depth=1, prefix="creator_")
    private User creator;

    @IndexedEmbedded(depth=1, prefix="editor_")   
    private User lastEditor;

    @Field(index=Index.UN_TOKENIZED)
    @DateBridge(resolution=Resolution.MILLISECOND)
    private Timestamp createTime;

    @Field(index=Index.UN_TOKENIZED)
    @DateBridge(resolution=Resolution.MILLISECOND)
    private Timestamp modifyTime;

    @Field(index=Index.TOKENIZED, store=Store.YES)
    @Boost(value=3.5f)
    private String message;

    @Field(index=Index.TOKENIZED, store=Store.YES)
    @Boost(value=2.0f)
    private Integer viewCount;
}


Code between sessionFactory.openSession() and session.close():
Code:
Session session = this.getHibernateTemplate().getSessionFactory().getCurrentSession();
         FullTextSession fullTextSession = Search.getFullTextSession(session);
         fullTextSession.setFlushMode(FlushMode.MANUAL);
         fullTextSession.setCacheMode(CacheMode.IGNORE);
         //Scrollable results will avoid loading too many objects in memory
         ScrollableResults results = fullTextSession.createCriteria(Post.class )
             .setFetchSize(BATCH_SIZE)
             .scroll( ScrollMode.FORWARD_ONLY );
         int index = 0;
         while( results.next() ) {
             ++index;
             fullTextSession.index( results.get(0) ); //index each element
             if ((index % BATCH_SIZE) == 0) {
                 fullTextSession.flushToIndexes(); //apply changes to indexes
                 fullTextSession.clear(); //clear since the queue is processed
             }
         }

private static final Integer BATCH_SIZE = 1000;



I am using file system directory provider.

Here are my settings.
Code:
hibernate.search.Post.optimizer.operation_limit.max=10000
hibernate.search.Post.optimizer.transaction_limit.max=1000
hibernate.search.Post.indexwriter.batch.merge_factor=256
hibernate.search.Post.indexwriter.batch.ram_buffer_size=1024



I am taking 3 hours to 4 to batch 3 million post.

Read this: http://hibernate.org/42.html


Top
 Profile  
 
 Post subject:
PostPosted: Thu Aug 21, 2008 9:03 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Hi,

what do you actually expect? Do you have some sort of benchmark/requirement you have to reach? Even with 4 hours indexing time you are still indexing 208 entities per second. That's not too bad if you ask me.

What hardware are you using? Especially how much RAM do you have? The more RAM the better.

When it comes to performance tuning you will have to distinguish the actual indexing time vs the actual time to retrieve the objects from the db. For example, you are using CacheMode.IGNORE which means that Hibernate will not use the second level cache at all. This makes perfect sense for standalone entities, however in the case where you index embedded entities it might be an advantage cache Users. The idea is that there are ways less users in the systems then posts and many posts will be from the same user. If the user is already in the cache it might save another db roundtrip. Just a thought ;-)

--Hardy


Top
 Profile  
 
 Post subject:
PostPosted: Thu Aug 21, 2008 1:48 pm 
Regular
Regular

Joined: Tue Apr 01, 2008 5:39 pm
Posts: 61
my expected result is somewhere around 20 minutes.


i have 2 gb of ram. 1 gb is used up for application server.


Top
 Profile  
 
 Post subject:
PostPosted: Fri Aug 22, 2008 4:42 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Hi,

2GB of which 1GB is used by the app server seems very little. I am sure you can gain a lot by going to 4 or even better 8GB.

Of course also the processor is important. Switching to a quad core for example could make a uge difference.

20 minutes seems to be a tough goal though and can definitely not be achieved by making a simple code change.

--Hardy


Top
 Profile  
 
 Post subject:
PostPosted: Fri Aug 22, 2008 8:49 am 
Regular
Regular

Joined: Tue Apr 01, 2008 5:39 pm
Posts: 61
Hi thanks.

Except i am not indexing from scratch. I am indexing only changes from existing lucene directory and db.


Top
 Profile  
 
 Post subject:
PostPosted: Fri Aug 22, 2008 1:56 pm 
Regular
Regular

Joined: Tue Apr 01, 2008 5:39 pm
Posts: 61
btw i set

ram_buffered_size to be 4096


Top
 Profile  
 
 Post subject:
PostPosted: Sat Aug 23, 2008 1:25 pm 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
cablepuff wrote:

Except i am not indexing from scratch. I am indexing only changes from existing lucene directory and db.


And that takes 4 hours? I thought your initial index creating took 4 hours? And why do you not rely on automatic index synchronization?

--Hardy


Top
 Profile  
 
 Post subject:
PostPosted: Sun Aug 24, 2008 3:16 am 
Regular
Regular

Joined: Tue Apr 01, 2008 5:39 pm
Posts: 61
i want to be able to index asynchronously on different server. (the index are build in batch process like very day).


Top
 Profile  
 
 Post subject:
PostPosted: Sun Aug 24, 2008 5:27 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Why not use the the JMS Master/Slave configuration then? Just set the refresh period of the index to once a day.


Top
 Profile  
 
 Post subject:
PostPosted: Sun Aug 24, 2008 5:31 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi cablepuff,
Quote:
my expected result is somewhere around 20 minutes.

I can't promise anything of course as it depends on way too many factors, but I believe
you could go much faster, you have to check and tweak some more settings:

a)creator and lastEditor fields should be loaded in the criteria, so to scroll on the Post entities without having to load embedded fields later on (unless you are trying to have the users in a cache, but it depends on how much different users are expected to be active and the cache configuration)
The choice about caching the users may also depend on how expensive it is to load one, I suggest that if they don't have other fields going to be loaded you avoid the cache and load them in same scrollable.

b)merge factor = 256 doesn't necessarily make you faster, higher is better but you should try also lower values. I've had good results with 10000.

c)you gave a hint of 1GB ram to the indexer, did you remember to allocate more ram to your JVM? what are your JVM switches? You say 1GB to your application server, but you shouldn't allocate all jvm available ram to the indexer, it will slow down a lot because of GC triggering all the time.
BTW don't set it go 4GB if you don't have it, it really will kill performance; you're saving yourself because of flushToIndexes occurring before the trigger.

d) the .setFetchSize(BATCH_SIZE) doesn't need to match your ((index % BATCH_SIZE) == 0).

Quote:
i want to be able to index asynchronously on different server. (the index are build in batch process like very day).

yes you can do that, did you read about the JMS indexing configuration in the reference?
reading the book may help too.[/code]

_________________
Sanne
http://in.relation.to/


Last edited by sanne.grinovero on Sun Aug 24, 2008 5:43 am, edited 1 time in total.

Top
 Profile  
 
 Post subject:
PostPosted: Sun Aug 24, 2008 5:40 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
cablepuff wrote:
btw i set

ram_buffered_size to be 4096


BTW it should be
Code:
ram_buffer_size

maybe you just misspelled the parameter.

Hardy, should we log some warning for unrecognized statements in configuration? I don't think it could be easy.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject:
PostPosted: Sun Aug 24, 2008 6:07 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Hi,

it would be nice to get some warnings, but it might not be so easy. To make this work one would need some sort of central parameter repository. At the moment the parameters are read ad hoc when needed.

--Hardy


Top
 Profile  
 
 Post subject:
PostPosted: Sun Aug 24, 2008 7:11 am 
Regular
Regular

Joined: Tue Apr 01, 2008 5:39 pm
Posts: 61
s.grinovero wrote:
Hi cablepuff,
Quote:
my expected result is somewhere around 20 minutes.

I can't promise anything of course as it depends on way too many factors, but I believe
you could go much faster, you have to check and tweak some more settings:

a)creator and lastEditor fields should be loaded in the criteria, so to scroll on the Post entities without having to load embedded fields later on (unless you are trying to have the users in a cache, but it depends on how much different users are expected to be active and the cache configuration)
The choice about caching the users may also depend on how expensive it is to load one, I suggest that if they don't have other fields going to be loaded you avoid the cache and load them in same scrollable.

Hmm how do you load them in the same scrollable result.

Quote:
b)merge factor = 256 doesn't necessarily make you faster, higher is better but you should try also lower values. I've had good results with 10000.

c)you gave a hint of 1GB ram to the indexer, did you remember to allocate more ram to your JVM? what are your JVM switches? You say 1GB to your application server, but you shouldn't allocate all jvm available ram to the indexer, it will slow down a lot because of GC triggering all the time.
BTW don't set it go 4GB if you don't have it, it really will kill performance; you're saving yourself because of flushToIndexes occurring before the trigger.

Code:
d) the .setFetchSize(BATCH_SIZE) doesn't need to match your ((index % BATCH_SIZE) == 0).



b.) hmm maybe i am going to try 8192

c.) interesting

d.) i follow the documentation guide (on hiberante documentation page).
Quote:
i want to be able to index asynchronously on different server. (the index are build in batch process like very day).

yes you can do that, did you read about the JMS indexing configuration in the reference?

Quote:
reading the book may help too.


Top
 Profile  
 
 Post subject:
PostPosted: Wed Aug 27, 2008 11:07 pm 
Regular
Regular

Joined: Tue Apr 01, 2008 5:39 pm
Posts: 61
whats tghe difference between fetch size and batch size in this case.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Sep 08, 2008 5:45 pm 
Beginner
Beginner

Joined: Tue May 03, 2005 11:45 pm
Posts: 43
cablepuff did you get this resolved? I'm having similar issues but with only 600k rows. I let it run all night and no luck on a decently fast machine with 4 gigs of ram but I only had 2 gigs allocated to my process.


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 20 posts ]  Go to page 1, 2  Next

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.