-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 3 posts ] 
Author Message
 Post subject: Analyser or Index update to ignore certain terms
PostPosted: Wed Mar 18, 2015 5:45 pm 
Newbie

Joined: Tue Jul 08, 2014 3:27 pm
Posts: 6
We are using Hibernate search for our customer application. Name and address are searchable fields. I am looking for some suggestion on how to handle following scenarios.

1. Abbreviations : like LLC or Inc in name. These are very generic terms and can be in most of the names. If this term is present in the field then how I can avoid being included in search or not to write in the index at all.

2. How to handle preferred names of people like Bob for Robert, Mike for Michael, Josh for Joshua etc . In my search if Bob X is searched then I want to return Robert X as well because it implies same name.

3. Similar to above two scenarios there are abbreviation in address. Rd for Road, St for Street, etc. How should I handle that I am not missing results in search due to abbreviations.

Please suggest and thank you !!


Top
 Profile  
 
 Post subject: Re: Analyser or Index update to ignore certain terms
PostPosted: Mon Mar 23, 2015 5:33 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi,
to skip any keyword, that's called a "stopword", so Lucene has an out of the box TokenFilter for that, you can define a custom Analyzer to use it like this:

Code:
@AnalyzerDef(name = "customanalyzer",
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
      @TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
      @TokenFilterDef(factory = LowerCaseFilterFactory.class),
      @TokenFilterDef(factory = StopFilterFactory.class, params = {
         @Parameter(name = "words", value = "stoplist.properties"),
         @Parameter(name = "ignoreCase", value = "true")
      })
})


You'll have to list the terms you want to be ignored in the referred properties file, which is a resource to add to your application.

The case to replace "Bob" with "Robert" can be handled in a similar way, by using the SynonymFilter.

Code:
@TokenFilterDef(factory = SynonymFilterFactory.class, params = {
    @Parameter(name = "synonyms",
        value = "org/hibernate/search/test/analyzer/synonyms.properties")
}


Code:
import org.apache.lucene.analysis.core.LowerCaseFilterFactory;
import org.apache.lucene.analysis.core.StopFilterFactory;
import org.apache.lucene.analysis.standard.StandardTokenizerFactory;
import org.apache.lucene.analysis.synonym.SynonymFilterFactory;

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Analyser or Index update to ignore certain terms
PostPosted: Tue Mar 24, 2015 10:09 am 
Newbie

Joined: Tue Jul 08, 2014 3:27 pm
Posts: 6
Thank you Sanne !!


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 3 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.