-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 2 posts ] 
Author Message
 Post subject: Indexing with lazy fields, or offline text extraction
PostPosted: Fri May 11, 2012 1:07 pm 
Newbie

Joined: Fri Dec 17, 2010 11:14 am
Posts: 11
We've followed the instructions here: https://community.jboss.org/wiki/Hibern ... Extraction, I was expecting that any attachments that we indexed (which were included as "lazy" fields), would get done in a seperate thread. However this didn't seem to be the case -- or at least the thread that contiained the transaction we still found to be blocked when commit happened.

We wound up switching to an async worker (hibernate.search.worker.execution=aync). Which turned out to solve our problem (although some things aren't instantly searchable).

Is my understanding of what the lazy boolean does completely wrong, or did i miss something?

For versions:
hibernate: 3.6.5.Final
hibernate-search:3.4.0.Final


Top
 Profile  
 
 Post subject: Re: Indexing with lazy fields, or offline text extraction
PostPosted: Wed May 16, 2012 7:06 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Hi,

here are a couple of comments which might point you into the right direction and clear out some of your questions.

First off, the code you are referring to is based on some contributed wiki content and not something we in the Hibernate team directly recommend and support. Let's look at some of the bits and pieces here.

Quote:
I was expecting that any attachments that we indexed (which were included as "lazy" fields), would get done in a seperate thread.


Why did you think that? Per default indexing happens within the transaction. If you want to index directly you should use the async execution. Even better, however, is to use the mass indexer api (check the online documentation). The mass indexer is explicitly written for indexing large data sets utilizing multiple threads for object loading and indexing. This is the approach I would recommend in your case.

Quote:
Is my understanding of what the lazy boolean does completely wrong, or did i miss something?


It depends what your exact expectations were, but I think they were indeed wrong :-) The lazy option specified is Lucene specific. It allows to load Document field data lazily. This is relevant at search time or when you try to retrieve data from the document. However, it this option is not relevant at indexing time. Think about it, to make something searchable I have to index it. I cannot wait until someone searches for it and then lazily index it ;-) When you want to make your entity and text data searchable you need to access all relevant data there and then. The choice you have is to do it synchronous or asynchronous via the worker configuration or you are using the mass indexer api which gives you a whole bunch of configuration options to fine tune the indexing performance.

Hope this helps,
Hardy


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 2 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.