Hibernate Books

All times are UTC - 5 hours [ DST ]



Post new topic Reply to topic  [ 9 posts ] 
Author Message
 Post subject: Synonym Search
PostPosted: Tue Jan 31, 2012 8:27 am 
Regular
Regular

Joined: Thu Jun 16, 2011 12:03 pm
Posts: 94
Hi,

I would like to implement a synonym search. I don't want to save the synonyms in the index. I just want to use the synonym analyzer in the search.

it would be something like this:

Code:
create query
use synonym analyzer
get results


I have been looking up some examples in the documentation but i don't really know how to do it...

any ideas?

thanks in advance,

Hibernator,


Top
 Profile  
 
 Post subject: Re: Synonym Search
PostPosted: Wed Feb 01, 2012 5:25 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Hi Hibernator :-)

The easiest is to build your synonym analyzer using the @AnalyzerDef annotation/framework. There are examples in the documentation on how build your custom analyzers using this annotation. These analyzer definitions are global and just defining them does not use them yet. Normally you would use this analyzer also at indexing time by referencing it as parameter on the @Field annotation (just as an example).
In your case, however, you want to use the analyzer at query time. I assume you want to use a QueryParser to create the query. A query parser takes as constructor argument the analyzer to use and Hibernate Search's SearchFactory has a getAnalyzer(String name) to retrieve a named analyzer (the one you defined with @AnalyzerDef). Is this what you are after?

Last but not least, in most cases it is actually recommended to use the same analyzer for indexing and searching. I don't know your usecase, but thought it might be worth pointing out ;-)

--Hardy


Top
 Profile  
 
 Post subject: Re: Synonym Search
PostPosted: Wed Feb 01, 2012 6:25 am 
Regular
Regular

Joined: Thu Jun 16, 2011 12:03 pm
Posts: 94
Hi Hardy,

I have been trying and I got something,

this is my analyzer definition:

Code:
@AnalyzerDef(name = "synonymFilter",
      charFilters = {
      @CharFilterDef(factory = MappingCharFilterFactory.class, params = { @Parameter(name = "mapping", value = "mapping-chars.properties") }) },
      tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
      filters = {
            @TokenFilterDef(factory = LowerCaseFilterFactory.class),
            @TokenFilterDef(factory = StopFilterFactory.class, params = {
                                          @Parameter(name = "words", value = "stoplist.properties"),
                                          @Parameter(name = "ignoreCase", value = "true")
                                          }
            ),
            @TokenFilterDef(factory = SynonymFilterFactory.class, params = {
                        @Parameter(name = "synonyms", value = "synonyms.txt"),
                        @Parameter(name = "ignoreCase", value = "true")
            })
            }
      )   



And, as you said my query is like this one. Basically, i use the synonym analyzer to parse the terms.

Code:
Analyzer analyzer = Search.getFullTextSession(session).getSearchFactory().getAnalyzer("synonymFilter");
           
            BooleanQuery[] Querys = new BooleanQuery[campos.length];
           
            int cont = 0;
            while(cont < fields.length)
            {
                BooleanQuery andQuery = new BooleanQuery();

                TokenStream tokenStream = analyzer.tokenStream(campos[cont], new StringReader(q));
               
                CharTermAttribute charTermAttribute = tokenStream.getAttribute(CharTermAttribute.class);
               
                while (tokenStream.incrementToken())
                {
                    String term = charTermAttribute.toString();
                    andQuery.add(new TermQuery(new Term(campos[cont], term)), Occur.MUST);
                }
   
                Querys[cont] = andQuery;
               
                cont++;
            }
   


Next point, I have in my synonyms.txt the next line

Code:
#Equivalent synonyms may be separated with commas and give
#no explicit mapping. In this case the mapping behavior will
#be taken from the expand parameter in the schema. This allows
#the same synonym file to be used in different synonym handling strategies.
#Examples:
ipod, i-pod, i pod
foozball , foosball, lobos


And it creates a query like this one:
(+titulo.titulo:foozball +titulo.titulo:foosball +titulo.titulo:lobos) (AND operator)

But what i really need is this (titulo.titulo:foozball titulo.titulo:foosball titulo.titulo:lobos) (OR operator)

Any ideas?

PD:I just want to show how the synonym query works in case anybody does not know how to implement it :)

Edited: I have changed the parameter expand = true/false and i need expand = true but it keeps using AND operator...
Edited 2: I have changed my query using should instead of must and now it seems to be working fine

Thanks in advance!


Top
 Profile  
 
 Post subject: Re: Synonym Search
PostPosted: Fri Feb 03, 2012 4:21 am 
Regular
Regular

Joined: Thu Jun 16, 2011 12:03 pm
Posts: 94
Hi all,

Everything seems to work fine. Maybe i am going to use a big synonym file (700 MB) and I don't really think that it would a good idea to store the file in my maven Ear project in the folder src/main/resources. I have been thinking about putting somewhere the file and reference it from the EAR maven project. The problem is that I use the @Analyzer to define the path file.

How could i reference an external folder to my synonyms.txt file?

Thanks in advance,

Hibernator,


Top
 Profile  
 
 Post subject: Re: Synonym Search
PostPosted: Fri Feb 03, 2012 5:54 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Hi,

first a comment to your query. You are using a BooleanQuery and join the different queries via Occur.MUST). No wonder there is a AND logic. I am wondering why you create a token stream and do all this effort. Couldn't you just create a QueryParser and pass the analyzer? The parser will for each token in the query use your analyzer and automatically add the synonyms to the search.

Regarding your other question. Have you tried specifying a fully qualified path? I think it just opens a File with the specified parameter. Personally I would work with different files. A small one which is part of the build and used for testing and one which is referenced in production.

--Hardy


Top
 Profile  
 
 Post subject: Re: Synonym Search
PostPosted: Fri Feb 03, 2012 6:13 am 
Regular
Regular

Joined: Thu Jun 16, 2011 12:03 pm
Posts: 94
Hi,

Thank you for your response. I'll try the complete path for my synonym.txt

About my query, I have to use different fields to search depending on the selection of the user so the only way that I found to do that (I am not an expert..) was to use the BooleanQuery...What i mean is that the search fields can change in real time...

Any suggestions?

Thanks in advance!


Top
 Profile  
 
 Post subject: Re: Synonym Search
PostPosted: Fri Feb 03, 2012 8:37 am 
Regular
Regular

Joined: Thu Jun 16, 2011 12:03 pm
Posts: 94
Hi other question,

how can i set the fully qualified path from my deploy jboss folder????

thanks in advance,

hibernator,


Top
 Profile  
 
 Post subject: Re: Synonym Search
PostPosted: Tue Feb 07, 2012 11:47 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
how can i set the fully qualified path from my deploy jboss folder????

don't you know the path? You might want to use some sort of filtering technique to filter the right path into the source file (the path will differ between test and production and maybe also between different production environments)

--Hardy


Top
 Profile  
 
 Post subject: Re: Synonym Search
PostPosted: Fri Feb 17, 2012 8:49 am 
Regular
Regular

Joined: Thu Jun 16, 2011 12:03 pm
Posts: 94
Hi,

I just know how to reference the synonym file using the antotations. The synonym file is in my ear project and i would like to use one from outside my ear project.

@AnalyzerDef(.....
@TokenFilterDef(factory = SynonymFilterFactory.class, params = {
@Parameter(name = "synonyms", value = "synonyms.txt"),
@Parameter(name = "ignoreCase", value = "true")


It would be perfect, as you said, to use a file for test an another file for production environment, but i do not know how to do that.

Any references on using an external synonym file and a test one?

thanks in advance,


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 9 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.