I have an analyzer mapped with the following configuration:
Code:
@AnalyzerDef(
name = "3gramanalyzer",
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = StandardFilterFactory.class),
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = NGramFilterFactory.class,
params = {
@Parameter(name = "minGramSize", value = "3"),
@Parameter(name = "maxGramSize", value = "3")
})
}
)
and have applied it to the following field in my business object:
Code:
@Fields({
@Field(name = "description",
analyzer = @Analyzer(definition = "freetextanalyzer")),
@Field(name = "description_ngram",
boost = @Boost(value = 0.5f),
analyzer = @Analyzer(definition = "3gramanalyzer"))
})
@Column(name = "description")
public String getDescription()
{
return this.description;
}
After loading a bunch of canned data into my application, I can observe in the lucene index (via Luke) that the 3-gram tokens are indexed correctly. When I attempt to use the same analyzer to query the index, no matches appear (for instance, abdominal does not match abdomen).
Code:
public List<Study> search(String rawQuery, int aPageSize, int aPageIndex, Account currentUser) {
List<Study> results = new ArrayList<Study>();
FullTextSession ftSession = Search.getFullTextSession(getSession());
SearchFactory searchFactory = ftSession.getSearchFactory();
Analyzer standardAnalyzer = searchFactory.getAnalyzer(Study.class);
Analyzer ngramAnalyzer = searchFactory.getAnalyzer("3gramanalyzer");
PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(standardAnalyzer);
wrapper.addAnalyzer("description_ngram", ngramAnalyzer);
QueryParser parser =
new MultiFieldQueryParser(FT_SEARCH_FIELDS, wrapper);
Query luceneQuery = null;
if(query.length() > 0) {
try {
luceneQuery = parser.parse(rawQuery);
logger.info("Parsed lucene query: " + luceneQuery.toString());
} catch (ParseException pe) {
throw new RuntimeException("Unable to parse: " + rawQuery, pe);
}
} else {
luceneQuery = new MatchAllDocsQuery();
}
FullTextQuery ftQuery =
ftSession.createFullTextQuery(luceneQuery, Study.class);
ftQuery.setMaxResults(aPageSize);
ftQuery.setFirstResult(aPageIndex);
logger.info("Executing lucene query: " + ftQuery.toString());
results = ftQuery.list();
return results;
}
I added a bit of logging right before the list() method call to view the lucene syntax of the query being run. I see the following in the search query:
description_ngram:"abd bdo dom omi min ina nal"
This looks to my eyes like an exact phrase match query in Lucene syntax. Is my usage incorrect, or is there a bug in the NGramTokenFilter class?
This user also reported a similar issue, and has not been responded to:
viewtopic.php?f=9&t=999041