-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 8 posts ] 
Author Message
 Post subject: HibernateSearch and extracted text from file (huge String)
PostPosted: Sat Feb 04, 2012 5:22 pm 
Newbie

Joined: Sat Feb 04, 2012 5:12 pm
Posts: 4
I have one field which is @Lob and I store my extracted text content (with Tika) in it. Content is stored to DB, but for some long text (cca 1 000 000 chars) hibernate search doesn't index this field. For short documents, content is indexed. There is no any Exception on CRUD. Can anyone help? My field config is:

Code:
@Field(
    name = "un_tok_searchableTextContent",
    index = Index.UN_TOKENIZED,
    store = Store.NO)
@Lob
public String getSearcheableTextContent() {
    return _searcheableTextContent;
}


I also did setup this property in my applicationContext.xml:

Code:
<prop key="hibernate.search.default.indexwriter.max_field_length">10000000</prop>
// default is 10000


Top
 Profile  
 
 Post subject: Re: HibernateSearch and extracted text from file (huge String)
PostPosted: Sun Feb 05, 2012 8:53 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi,
are you sure you mean to map this large string as UN_TOKENIZED ?

How do you know it's not being indexed, because you can't find it after indexing? Did you check the index contents with tools like Luke ?
http://code.google.com/p/luke/

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: HibernateSearch and extracted text from file (huge String)
PostPosted: Tue Feb 07, 2012 8:00 am 
Newbie

Joined: Sat Feb 04, 2012 5:12 pm
Posts: 4
1. Well, I am sure, because how else can I search speciffic phrase (full text search)? Or is there other way to do this? I am also indexing this field in other ways (tokenized, ngramtokenized, etc...), which is successfull for those large strings as i can see in Luke.

2. Yes, I checked it through Luke, and there are for example only 7 fields, but I inserted 15 docs... All other indexes are ok (for example, I have 15 indexed titles in Luke).

Also, I tried to use logger to se what is happening, but didn't get any error nor any usefull info. My setup looks like:
Code:
log4j.logger.org.hibernate.search=INFO
log4j.logger.org.hibernate.search.type=ALL
log4j.logger.org.hibernate.search=debug
log4j.logger.org.apache.lucene=INFO
log4j.logger.org.apache.lucene.analysis.standard.StandardAnalyzer=debug
log4j.logger.org.apache.lucene.index.IndexWriter=debug
log4j.logger.org.apache.lucene.type=ALL


Any idea?


Top
 Profile  
 
 Post subject: Re: HibernateSearch and extracted text from file (huge String)
PostPosted: Tue Feb 07, 2012 11:17 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Quote:
log4j.logger.org.apache.lucene=INFO

Lucene doesn't use Log4J so it won't log anything.

Quote:
Yes, I checked it through Luke, and there are for example only 7 fields, but I inserted 15 docs

You're not necessarily going to get a new field for each document, there is no relation.

Quote:
1. Well, I am sure, because how else can I search speciffic phrase (full text search)?

Yes you could use a PhraseQuery.

UN_TOKENIZED means that it will match only exact queries, such as TermQuery, and is often not practical if it's a long string; this is actually so unlikely that I think we don't have a test to cover it. I'll add one for the sake of completeness, but I'd suggest you to look into PhraseQuery or other Query options as what you're trying to do would be very inefficient, and not very flexible.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: HibernateSearch and extracted text from file (huge String)
PostPosted: Tue Feb 07, 2012 11:57 am 
Newbie

Joined: Sat Feb 04, 2012 5:12 pm
Posts: 4
Quote:
Lucene doesn't use Log4J so it won't log anything.

What about HibernateSearch, is right config like this:
Code:
log4j.logger.org.hibernate.search=INFO
log4j.logger.org.hibernate.search.type=ALL
log4j.logger.org.hibernate.search=debug


Quote:
You're not necessarily going to get a new field for each document, there is no relation.

Can You explain me this? If I un_tokenize titles, I have as many titles as inserted docs (one title for every doc). How is posible not to have same number of un_tokenized text fields?

Quote:
Yes you could use a PhraseQuery.

UN_TOKENIZED means that it will match only exact queries, such as TermQuery, and is often not practical if it's a long string; this is actually so unlikely that I think we don't have a test to cover it. I'll add one for the sake of completeness, but I'd suggest you to look into PhraseQuery or other Query options as what you're trying to do would be very inefficient, and not very flexible.

Maybe I didn't understand it from HibSearchInAction book, but I think that I must have UN_TOKENIZED field to search phrase in text content (with slop factor)? Or I am not right?

Btw, Thanks for fast replays.


Top
 Profile  
 
 Post subject: Re: HibernateSearch and extracted text from file (huge String)
PostPosted: Wed Feb 08, 2012 12:44 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Code:
log4j.logger.org.hibernate.search=debug

This should be correct.

Quote:
Can You explain me this? If I un_tokenize titles, I have as many titles as inserted docs (one title for every doc). How is posible not to have same number of un_tokenized text fields?

ah ok if you assume untokenized, then it's likely correct. But Luke can not always extract all values back from the index, especially if they are not STORED as well.

Quote:
Maybe I didn't understand it from HibSearchInAction book, but I think that I must have UN_TOKENIZED field to search phrase in text content (with slop factor)? Or I am not right?

No that's not correct. PhraseQuery requires the text to be tokenized (analyzed).

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: HibernateSearch and extracted text from file (huge String)
PostPosted: Thu Feb 09, 2012 7:45 am 
Newbie

Joined: Sat Feb 04, 2012 5:12 pm
Posts: 4
Thanks for replay. As you explained, I was wrong in understanding concept: everything works fine now - Phrase query on analyzed field. There is no need for me to index such large string un_tokenized now. Thank you for your help.


Top
 Profile  
 
 Post subject: Re: HibernateSearch and extracted text from file (huge String)
PostPosted: Thu Feb 09, 2012 9:03 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
great, thank you for letting me know.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 8 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.