NgramTokenFilter

jordan002 · **Joined:** Fri Sep 11, 2009 4:58 pm **Posts:** 3

I have an analyzer mapped with the following configuration:

Code:

@AnalyzerDef(
        name = "3gramanalyzer",
        tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
        filters = {
            @TokenFilterDef(factory = StandardFilterFactory.class),
            @TokenFilterDef(factory = LowerCaseFilterFactory.class),
            @TokenFilterDef(factory = NGramFilterFactory.class,
                params = {
                    @Parameter(name = "minGramSize", value = "3"),
                    @Parameter(name = "maxGramSize", value = "3")
            })
        }
)

and have applied it to the following field in my business object:

Code:

    @Fields({
            @Field(name = "description",
                   analyzer = @Analyzer(definition = "freetextanalyzer")),
            @Field(name = "description_ngram",
                   boost = @Boost(value = 0.5f),
                   analyzer = @Analyzer(definition = "3gramanalyzer"))
   })
   @Column(name = "description")
   public String getDescription()
   {
      return this.description;
   }

After loading a bunch of canned data into my application, I can observe in the lucene index (via Luke) that the 3-gram tokens are indexed correctly. When I attempt to use the same analyzer to query the index, no matches appear (for instance, abdominal does not match abdomen).

Code:

public List<Study> search(String rawQuery, int aPageSize, int aPageIndex, Account currentUser) {
        List<Study> results = new ArrayList<Study>();
        FullTextSession ftSession = Search.getFullTextSession(getSession());
        SearchFactory searchFactory = ftSession.getSearchFactory();

        Analyzer standardAnalyzer = searchFactory.getAnalyzer(Study.class);
        Analyzer ngramAnalyzer = searchFactory.getAnalyzer("3gramanalyzer");
        PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(standardAnalyzer);
        wrapper.addAnalyzer("description_ngram", ngramAnalyzer);

        QueryParser parser =
            new MultiFieldQueryParser(FT_SEARCH_FIELDS, wrapper);

        Query luceneQuery = null;
        if(query.length() > 0) {
            try {
                luceneQuery = parser.parse(rawQuery);
                logger.info("Parsed lucene query: " + luceneQuery.toString());
            } catch (ParseException pe) {
                throw new RuntimeException("Unable to parse: " + rawQuery, pe);
            }
        } else {
            luceneQuery = new MatchAllDocsQuery();
        }

        FullTextQuery ftQuery =
            ftSession.createFullTextQuery(luceneQuery, Study.class);
        ftQuery.setMaxResults(aPageSize);
        ftQuery.setFirstResult(aPageIndex);
        logger.info("Executing lucene query: " + ftQuery.toString());
        results = ftQuery.list();

        return results;
    }

I added a bit of logging right before the list() method call to view the lucene syntax of the query being run. I see the following in the search query:

description_ngram:"abd bdo dom omi min ina nal"

This looks to my eyes like an exact phrase match query in Lucene syntax. Is my usage incorrect, or is there a bug in the NGramTokenFilter class?

This user also reported a similar issue, and has not been responded to:
viewtopic.php?f=9&t=999041

sanne.grinovero · **Posted:** Sat Sep 12, 2009 9:06 am

Hi jordan,

Quote:

I added a bit of logging right before the list() method call to view the lucene syntax of the query being run. I see the following in the search query:

description_ngram:"abd bdo dom omi min ina nal"

This looks to my eyes like an exact phrase match query in Lucene syntax. Is my usage incorrect, or is there a bug in the NGramTokenFilter class?

You're right, and Seide on the other topic https://forum.hibernate.org/viewtopic.php?f=9&t=999041 is right too and provided the solution;
in next version we might introduce something less verbose, proposals are welcome as are documentation patches and blog posts.