Hi,
to skip any keyword, that's called a "stopword", so Lucene has an out of the box TokenFilter for that, you can define a custom Analyzer to use it like this:
Code:
@AnalyzerDef(name = "customanalyzer",
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = StopFilterFactory.class, params = {
@Parameter(name = "words", value = "stoplist.properties"),
@Parameter(name = "ignoreCase", value = "true")
})
})
You'll have to list the terms you want to be ignored in the referred properties file, which is a resource to add to your application.
The case to replace "Bob" with "Robert" can be handled in a similar way, by using the SynonymFilter.
Code:
@TokenFilterDef(factory = SynonymFilterFactory.class, params = {
@Parameter(name = "synonyms",
value = "org/hibernate/search/test/analyzer/synonyms.properties")
}
Code:
import org.apache.lucene.analysis.core.LowerCaseFilterFactory;
import org.apache.lucene.analysis.core.StopFilterFactory;
import org.apache.lucene.analysis.standard.StandardTokenizerFactory;
import org.apache.lucene.analysis.synonym.SynonymFilterFactory;