-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 2 posts ] 
Author Message
 Post subject: SynonymFilterFactory and NGramFilterFactory
PostPosted: Mon Jun 22, 2009 2:07 pm 
Newbie

Joined: Mon Jun 22, 2009 12:20 pm
Posts: 4
Hi!

I'm trying to use a SynonymFilterFactory together with a NGramFilterFactory (while indexing). During search I only use NGramFilterFactory in my analyzer. Here are my definitions:
Code:
@AnalyzerDef(
         name="ngramanalyzer",
         tokenizer = @TokenizerDef(factory=StandardTokenizerFactory.class),
         filters = {
            @TokenFilterDef(factory=StandardFilterFactory.class),
            @TokenFilterDef(factory=LowerCaseFilterFactory.class),
            @TokenFilterDef(factory=ISOLatin1AccentFilterFactory.class),
            @TokenFilterDef(factory=StopFilterFactory.class,
                  params = @Parameter(name="words",value="germanstopword.txt")),
            @TokenFilterDef(factory=NGramFilterFactory.class,
                  params = { @Parameter(name="minGramSize", value="3"),
                         @Parameter(name="maxGramSize", value="3")})
         }
   ),
   @AnalyzerDef(
         name="ngramanalyzer_index",
         tokenizer = @TokenizerDef(factory=StandardTokenizerFactory.class),
         filters = {
            @TokenFilterDef(factory=StandardFilterFactory.class),
            @TokenFilterDef(factory=LowerCaseFilterFactory.class),
            @TokenFilterDef(factory=ISOLatin1AccentFilterFactory.class),
            @TokenFilterDef(factory=StopFilterFactory.class,
                  params = @Parameter(name="words",value="germanstopword.txt")),
            @TokenFilterDef(factory=SynonymFilterFactory.class,
                  params = { @Parameter(name="synonyms", value="synonyms.txt"),
                           @Parameter(name="ignoreCase", value="true"),
                           @Parameter(name="expand", value="true")}
            ),
            @TokenFilterDef(factory=NGramFilterFactory.class,
                  params = { @Parameter(name="minGramSize", value="3"),
                         @Parameter(name="maxGramSize", value="3")})
         }
   ),
...
...
@Column(name = "prodname")
   @Fields( {
         @Field(index = Index.TOKENIZED),
         @Field(index = Index.TOKENIZED,
               name = "name_ngram",
               store=Store.YES,
               analyzer = @Analyzer(definition = "ngramanalyzer_index"))
         })
   private String name;


Indexing seems to work fine. When I define synonyms like "mikrowelle,microwelle" I got "cro|ell|ell|icr|ikr|kro|lle|lle|mic|mik|owe|row|row|wel|wel" in my index. But during query, I only get Entitys containing "mikrowelle".
Code:
ftem = org.hibernate.search.jpa.Search.getFullTextEntityManager(em);
      Analyzer entityScopedAnalyzer = ftem.getSearchFactory().getAnalyzer("ngramanalyzer");
      final QueryParser ngramParser = new QueryParser("name_ngram", entityScopedAnalyzer);
      try {
         Query ngramQuery = parser.parse(query.trim());
         FullTextQuery hibQuery = ftem.createFullTextQuery(ngramQuery, SearchResult.class);
         resultList = hibQuery.getResultList();


ngramQuery.toString() returns
Code:
name_ngram:"mik ikr kro row owe wel ell lle"


Any suggestions what goes wrong? Thanks in advance!
--
Titus


Top
 Profile  
 
 Post subject: Re: SynonymFilterFactory and NGramFilterFactory
PostPosted: Wed Jun 24, 2009 2:14 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi,
I got the same issue some days ago; basically the query generated by the QueryParser is not suitable for n-grams: all n-grams have to be in the index, they are not in "should" but in "and". The token "ikr" in your query is only found in entities having the word "mikrowelle", so they are the only compatible with your search critaria.

I used the Analyzer directly to build up a BooleanQuery from the tokens and it's working now, but I'm not sure if this is the best solution... kinda promised myself to look into it as soon as I have more time. I'll be glad to hear if you find a better solution.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 2 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.