Hi all,
I would like to add a new module to my HS application to index pdf and html files. I dont want to store the whole text in my database, i just want to store it in the index. I have been searching on forums different alternatives like:
https://community.jboss.org/wiki/HibernateSearchAndOfflineTextExtraction http://twproject.blogspot.com.es/2007/11/using-hibernate-search-with-complex.html
I think that the correct way is the first link but before I start to implement this, I would like know if anybody had the same problem and his solution.
I have like 300000 records and each one has 1, 2 or 3 PDF/HTML files...so i think that off-line extraction is a good idea.
Let me know any ideas...
Thanks in advance,
Hibernator,
|