-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 2 posts ] 
Author Message
 Post subject: NgramTokenFilter
PostPosted: Fri Sep 11, 2009 5:11 pm 
Newbie

Joined: Fri Sep 11, 2009 4:58 pm
Posts: 3
I have an analyzer mapped with the following configuration:

Code:
@AnalyzerDef(
        name = "3gramanalyzer",
        tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
        filters = {
            @TokenFilterDef(factory = StandardFilterFactory.class),
            @TokenFilterDef(factory = LowerCaseFilterFactory.class),
            @TokenFilterDef(factory = NGramFilterFactory.class,
                params = {
                    @Parameter(name = "minGramSize", value = "3"),
                    @Parameter(name = "maxGramSize", value = "3")
            })
        }
)


and have applied it to the following field in my business object:

Code:
    @Fields({
            @Field(name = "description",
                   analyzer = @Analyzer(definition = "freetextanalyzer")),
            @Field(name = "description_ngram",
                   boost = @Boost(value = 0.5f),
                   analyzer = @Analyzer(definition = "3gramanalyzer"))
   })
   @Column(name = "description")
   public String getDescription()
   {
      return this.description;
   }


After loading a bunch of canned data into my application, I can observe in the lucene index (via Luke) that the 3-gram tokens are indexed correctly. When I attempt to use the same analyzer to query the index, no matches appear (for instance, abdominal does not match abdomen).

Code:
public List<Study> search(String rawQuery, int aPageSize, int aPageIndex, Account currentUser) {
        List<Study> results = new ArrayList<Study>();
        FullTextSession ftSession = Search.getFullTextSession(getSession());
        SearchFactory searchFactory = ftSession.getSearchFactory();

        Analyzer standardAnalyzer = searchFactory.getAnalyzer(Study.class);
        Analyzer ngramAnalyzer = searchFactory.getAnalyzer("3gramanalyzer");
        PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(standardAnalyzer);
        wrapper.addAnalyzer("description_ngram", ngramAnalyzer);

        QueryParser parser =
            new MultiFieldQueryParser(FT_SEARCH_FIELDS, wrapper);

        Query luceneQuery = null;
        if(query.length() > 0) {
            try {
                luceneQuery = parser.parse(rawQuery);
                logger.info("Parsed lucene query: " + luceneQuery.toString());
            } catch (ParseException pe) {
                throw new RuntimeException("Unable to parse: " + rawQuery, pe);
            }
        } else {
            luceneQuery = new MatchAllDocsQuery();
        }

        FullTextQuery ftQuery =
            ftSession.createFullTextQuery(luceneQuery, Study.class);
        ftQuery.setMaxResults(aPageSize);
        ftQuery.setFirstResult(aPageIndex);
        logger.info("Executing lucene query: " + ftQuery.toString());
        results = ftQuery.list();

        return results;
    }


I added a bit of logging right before the list() method call to view the lucene syntax of the query being run. I see the following in the search query:

description_ngram:"abd bdo dom omi min ina nal"

This looks to my eyes like an exact phrase match query in Lucene syntax. Is my usage incorrect, or is there a bug in the NGramTokenFilter class?

This user also reported a similar issue, and has not been responded to:
viewtopic.php?f=9&t=999041


Top
 Profile  
 
 Post subject: Re: NgramTokenFilter
PostPosted: Sat Sep 12, 2009 9:06 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi jordan,
Quote:
I added a bit of logging right before the list() method call to view the lucene syntax of the query being run. I see the following in the search query:

description_ngram:"abd bdo dom omi min ina nal"

This looks to my eyes like an exact phrase match query in Lucene syntax. Is my usage incorrect, or is there a bug in the NGramTokenFilter class?

You're right, and Seide on the other topic https://forum.hibernate.org/viewtopic.php?f=9&t=999041 is right too and provided the solution;
in next version we might introduce something less verbose, proposals are welcome as are documentation patches and blog posts.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 2 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.