-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 5 posts ] 
Author Message
 Post subject: Hibernate Search: Offline text extraction
PostPosted: Wed Oct 31, 2007 8:27 am 
Beginner
Beginner

Joined: Fri Aug 29, 2003 10:01 am
Posts: 34
Location: florence, italy
We are using Hibernate 3.2.5 with Search 3.0, and it works fine.

Now we have some entities referring to say huge PDF contents, whose content must be extracted before indexing; and the text extraction process takes some time: now this puts the transaction in hold before text extraction is completed. What is the appropriate way to do that?

We have set org.hibernate.worker.execution to asynch. Index writing is indeed asynchronous, but text extraction of course isn't, as it happens in the getter method (annotated with "Field") call. We have searched the search documentation and the forum, and found a lot of documentation and discussion concerning scaling the index writing process, but not about queueing the text extraction process.

We could build a @ClassBridge that puts the text extraction in a separate queue, but maybe there is a simpler way.


Top
 Profile  
 
 Post subject:
PostPosted: Wed Oct 31, 2007 5:30 pm 
Hibernate Team
Hibernate Team

Joined: Sun Sep 14, 2003 3:54 am
Posts: 7256
Location: Paris, France
Now this is quite problematic.
I initially did the extraction at the same time as the indexing, but it does not work well at all since the indexing is post transactional and and some lazy objects might need initialization.

What might work I think is to write a LazilyGeneratedField which would implement o.apache.lucene.document.Fieldable and lazily parse the PDF. That way the field processing will only happen during the async process.

If it works, I would appreciate if you could create a wiki page in the community area to capture the knowledge.

_________________
Emmanuel


Top
 Profile  
 
 Post subject:
PostPosted: Thu Nov 01, 2007 12:56 pm 
Beginner
Beginner

Joined: Fri Aug 29, 2003 10:01 am
Posts: 34
Location: florence, italy
Thanks for your support. I am already testing your solution and preparing a wiki contribution.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Nov 06, 2007 9:04 am 
Beginner
Beginner

Joined: Fri Aug 29, 2003 10:01 am
Posts: 34
Location: florence, italy
Published here:

http://www.hibernate.org/432.html

Cheers


Top
 Profile  
 
 Post subject:
PostPosted: Tue Nov 06, 2007 10:47 am 
Hibernate Team
Hibernate Team

Joined: Sun Sep 14, 2003 3:54 am
Posts: 7256
Location: Paris, France
Nice, thanks!

_________________
Emmanuel


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 5 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.