-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 5 posts ] 
Author Message
 Post subject: Newbie: tokenize or not
PostPosted: Mon Feb 16, 2009 4:16 pm 
Newbie

Joined: Fri Aug 17, 2007 2:11 pm
Posts: 19
I have this indexed property

Quote:

@Column(name = "ITEM_CODE")
@NotNull
@Field(index = Index.UN_TOKENIZED, store = Store.YES)
public String getItemCode() {
return this.itemCode;
}


and 2 other fields which are tokenized.
Itemcode will have unique values with no spaces like DSCH2, RMAV3000 etc

and this query in my search method.

Code:
String searchQuery = this.itemKeyFields ;
      
String[] itemFields = {"itemCode", "modelName","modelDescription"};

Map<String,Float> boostPerField = new HashMap<String,Float>(3);
      boostPerField.put( "itemCode", 4f);
      boostPerField.put( "modelName", 2f);
      boostPerField.put( "modelDescription", 2f);

QueryParser parser =
         new MultiFieldQueryParser( itemFields,  new StandardAnalyzer(), boostPerField );



With this setup, if I put dsc as a search keyword, I get no results, but if I put in dsc*, I get my desired results because it does a wildcard card search.

My question is should my itemcode field be tokenized too?
If I do that, I would get my results but is it the right way for field like this.


Other than that, the speed of Hibernate Search is phenomenal !!!

Thanks
Franco


Top
 Profile  
 
 Post subject:
PostPosted: Mon Feb 16, 2009 7:22 pm 
Newbie

Joined: Fri Aug 17, 2007 2:11 pm
Posts: 19
I now set the field to

Code:
@Field
public String getItemCode() {
return this.itemCode;
}


reindexed the entity, used only this field in the MultiFieldQueryParser and tried a search again.

I still have to put a wildcard at the end of the search term.

So how do I do a wildcard query (without the user needing to put a asterisk) on multiple fields where as in my example 1 of these fields is a unique identifier ?


Top
 Profile  
 
 Post subject:
PostPosted: Tue Feb 17, 2009 3:27 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi, you could try using Luke to "see" and experiment with Analyzers.
Generally they cleanup the noise and split the text in pieces, so

Code:
"Hibernate Search in Action" -> { "hibernate","search","in","action" }

or even

Code:
"Hibernate Search in Action" -> { "hibernate","search","action" }

when using smarter Analyzers.

The same process is applied to your query (of course you should use the same analyzer), and then Lucene will see how many tokens your query has in common with each document.
These terms however have to fully match, or you have to explicitly use a wildcard expression.

In your case the codes shouldn't be analyzed: usually you don't want codes to be processed as they could end up split by numbers, dots or other symbols. You may want to manage the case however.

If you don't want your users having to add the "*" wildcard you can either add it yourself, make your own custom QueryParser, or build the Query programmatically combining WildcardQuery with others.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject:
PostPosted: Tue Feb 17, 2009 3:50 am 
Newbie

Joined: Fri Feb 13, 2009 6:17 am
Posts: 5
For us the best way to search in multiple fields was to introduce a new field (e.g. "summary" or "all") containing all the values to be searched in. This also helps when you want to search across different types of entities (e.g. Vehicle and BusinessPartner).

In your case this would be:
Code:
@Field
public String getSummary()
{
  return itemCode + " " + modelName + " " + modelDescription;
}


I would not tokenize fields like itemCode - because it is just a code. Only fields containing more words should be tokenized.

To search you can change the query parsing - just extend QueryParser and use your own.

Something like:
Code:
// no special Lucene parsing
if (StringUtils.containsNone(query, "+-~:\"%"))
{
   query = query.toLowerCase();
   String[] strings = query.split("\\s");

   BooleanQuery wild = new BooleanQuery();
   for (String term : strings)
      wild.add(new WildcardQuery(new Term(FIELD_SUMMARY, term + "*")), BooleanClause.Occur.MUST);

   return wild;
}
else
{
   return super.parse(query);
}


--
Chris


Top
 Profile  
 
 Post subject:
PostPosted: Tue Feb 17, 2009 11:22 am 
Newbie

Joined: Fri Aug 17, 2007 2:11 pm
Posts: 19
@Chris, I used your solution with a wildcard query and it worked great. Thanks

@Sanne, it was my misunderstanding - did not realized that codes are split by numbers and other symbols. My item code could have a / in it.


In the book, Hibernate Search, a field EAN is indexed untokenized. I assumed my item code had similar meaning that is why I went that direction.


I have lots more reading to do...


Thanks to both of you for your replies, I appreciate your time.

Franco


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 5 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.