Hibernate Community • View topic - PDF document highlight

View unanswered posts | View active topics

Board index » Projects » Search, Validator, Shards

All times are UTC - 5 hours [ DST ]

PDF document highlight

Page 1 of 1

[ 3 posts ]

Previous topic | Next topic

Author

Message

hiber.nation

Post subject: PDF document highlight

Posted: Wed Jan 18, 2012 8:27 am

Newbie

Joined: Wed Dec 28, 2011 8:13 am
Posts: 3

Hi,

Is there a way to highlight search text in pdf document using hibernate search.
Anybody aware, may kindly help.

Top

hardy.ferentschik

Post subject: Re: PDF document highlight

Posted: Wed Jan 18, 2012 11:45 am

Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden

Hi,

first step is the search and highlighting itself. There you might be interested in this thread. Make sure to have a look at the Lucene highlighter api to see how this works.

The bigger question is the PDF document. Do you really want to highlight in the pdf? Or are you indexing the pdf content and want to search and highlight this content?

Either way, neither Lucene nor Search can index pdf as is. You need to extract the actual text (eg via tika). Then you can index and search on this extracted text. The offsets (needed for highlighting) will be relative to this extracted text. I am not sure whether you could easily use them as offset into pdf text (btw, I am not even aware of a library which can manipulate pdf).

Hope this helps.

--Hardy

Top

hiber.nation

Post subject: Re: PDF document highlight

Posted: Wed Jan 25, 2012 8:25 am

Newbie

Joined: Wed Dec 28, 2011 8:13 am
Posts: 3

Thanks Hardy...

Top

Page 1 of 1

[ 3 posts ]

Board index » Projects » Search, Validator, Shards

All times are UTC - 5 hours [ DST ]

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum