-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 
Author Message
 Post subject: How to handle Indexing?
PostPosted: Wed Apr 02, 2014 8:09 am 
Newbie

Joined: Wed Apr 02, 2014 8:04 am
Posts: 6
Hello!

In my database i have lots of word and pdf-documents stored as byte[].
To make these files searchable, i want to do a "batch-indexing".
How can i handle this?
Do i have to load the byte[] to a FileStream and parse as LucenePdfDocument? What comes next?
I read the part in "Hibernate in Action" but till now, i didn't get it....

Could someone help me out?

Thanks!!!


Top
 Profile  
 
 Post subject: Re: How to handle Indexing?
PostPosted: Thu Apr 10, 2014 8:04 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
To index PDF documents you need to mark those byte[] properties with the @Tika annotation.
I hope the examples in this chapter clarify?

http://docs.jboss.org/hibernate/search/4.5/reference/en-US/html_single/#d0e4335

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: How to handle Indexing?
PostPosted: Fri Apr 11, 2014 1:33 am 
Newbie

Joined: Wed Apr 02, 2014 8:04 am
Posts: 6
Hi!
Thanks for your reply!

So, i don't need a FieldBridge, but a "TikaBridge"?
But whats the difference? I don't get what a "TikaBridge" does....in the example there is a class called "Mp3TikaMetadataProcessor" used, but i don't find the implementation of that class. Very strange....


Top
 Profile  
 
 Post subject: Re: How to handle Indexing?
PostPosted: Fri Apr 11, 2014 10:10 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi,
a @TikaBridge is just a special kind of FieldBridge which helps you to integrate with Apache Tika, a project able to parse the binary streams like PDF and office documents (or even MP3s like in our example) to extract the test.

Which version of Hibernate Search are you using? Tika support is a relatively new feature. Also, the Tika libraries are optional: make sure you add them to your application.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.