-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 3 posts ] 
Author Message
 Post subject: PDF document highlight
PostPosted: Wed Jan 18, 2012 8:27 am 
Newbie

Joined: Wed Dec 28, 2011 8:13 am
Posts: 3
Hi,

Is there a way to highlight search text in pdf document using hibernate search.
Anybody aware, may kindly help.


Top
 Profile  
 
 Post subject: Re: PDF document highlight
PostPosted: Wed Jan 18, 2012 11:45 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Hi,

first step is the search and highlighting itself. There you might be interested in this thread. Make sure to have a look at the Lucene highlighter api to see how this works.

The bigger question is the PDF document. Do you really want to highlight in the pdf? Or are you indexing the pdf content and want to search and highlight this content?

Either way, neither Lucene nor Search can index pdf as is. You need to extract the actual text (eg via tika). Then you can index and search on this extracted text. The offsets (needed for highlighting) will be relative to this extracted text. I am not sure whether you could easily use them as offset into pdf text (btw, I am not even aware of a library which can manipulate pdf).

Hope this helps.

--Hardy


Top
 Profile  
 
 Post subject: Re: PDF document highlight
PostPosted: Wed Jan 25, 2012 8:25 am 
Newbie

Joined: Wed Dec 28, 2011 8:13 am
Posts: 3
Thanks Hardy...


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 3 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.