How to handle Indexing?

Cr13 · **Joined:** Wed Apr 02, 2014 8:04 am **Posts:** 6

Hello!

In my database i have lots of word and pdf-documents stored as byte[].
To make these files searchable, i want to do a "batch-indexing".
How can i handle this?
Do i have to load the byte[] to a FileStream and parse as LucenePdfDocument? What comes next?
I read the part in "Hibernate in Action" but till now, i didn't get it....

Could someone help me out?

Thanks!!!

sanne.grinovero · **Posted:** Thu Apr 10, 2014 8:04 am

To index PDF documents you need to mark those byte[] properties with the @Tika annotation.
I hope the examples in this chapter clarify?

http://docs.jboss.org/hibernate/search/4.5/reference/en-US/html_single/#d0e4335

Cr13 · **Joined:** Wed Apr 02, 2014 8:04 am **Posts:** 6

Hi!
Thanks for your reply!

So, i don't need a FieldBridge, but a "TikaBridge"?
But whats the difference? I don't get what a "TikaBridge" does....in the example there is a class called "Mp3TikaMetadataProcessor" used, but i don't find the implementation of that class. Very strange....

sanne.grinovero · **Posted:** Fri Apr 11, 2014 10:10 am

Hi,
a @TikaBridge is just a special kind of FieldBridge which helps you to integrate with Apache Tika, a project able to parse the binary streams like PDF and office documents (or even MP3s like in our example) to extract the test.

Which version of Hibernate Search are you using? Tika support is a relatively new feature. Also, the Tika libraries are optional: make sure you add them to your application.