Analyser or Index update to ignore certain terms

pxpatil · **Joined:** Tue Jul 08, 2014 3:27 pm **Posts:** 6

We are using Hibernate search for our customer application. Name and address are searchable fields. I am looking for some suggestion on how to handle following scenarios.

1. Abbreviations : like LLC or Inc in name. These are very generic terms and can be in most of the names. If this term is present in the field then how I can avoid being included in search or not to write in the index at all.

2. How to handle preferred names of people like Bob for Robert, Mike for Michael, Josh for Joshua etc . In my search if Bob X is searched then I want to return Robert X as well because it implies same name.

3. Similar to above two scenarios there are abbreviation in address. Rd for Road, St for Street, etc. How should I handle that I am not missing results in search due to abbreviations.

Please suggest and thank you !!

sanne.grinovero · **Posted:** Mon Mar 23, 2015 5:33 am

Hi,
to skip any keyword, that's called a "stopword", so Lucene has an out of the box TokenFilter for that, you can define a custom Analyzer to use it like this:

Code:

@AnalyzerDef(name = "customanalyzer",
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
      @TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
      @TokenFilterDef(factory = LowerCaseFilterFactory.class),
      @TokenFilterDef(factory = StopFilterFactory.class, params = {
         @Parameter(name = "words", value = "stoplist.properties"),
         @Parameter(name = "ignoreCase", value = "true")
      })
})

You'll have to list the terms you want to be ignored in the referred properties file, which is a resource to add to your application.

The case to replace "Bob" with "Robert" can be handled in a similar way, by using the SynonymFilter.

Code:

@TokenFilterDef(factory = SynonymFilterFactory.class, params = {
    @Parameter(name = "synonyms",
        value = "org/hibernate/search/test/analyzer/synonyms.properties")
}

Code:

import org.apache.lucene.analysis.core.LowerCaseFilterFactory;
import org.apache.lucene.analysis.core.StopFilterFactory;
import org.apache.lucene.analysis.standard.StandardTokenizerFactory;
import org.apache.lucene.analysis.synonym.SynonymFilterFactory;

pxpatil · **Joined:** Tue Jul 08, 2014 3:27 pm **Posts:** 6

Thank you Sanne !!