-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 18 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: Hibernate search multiple languages
PostPosted: Sun Nov 15, 2009 8:56 pm 
Newbie

Joined: Sun Nov 15, 2009 8:22 pm
Posts: 19
Hello!
I have tried to set up a multilanguage application for using Hibernate Search. I have some trouble and would be very thankfull if you could have a look at it and give me some help!

The Arrangement class
Code:
@Indexed
public class Arrangement  {

   private Set<Text> summarytexts = new HashSet<Text>();
   private Set<Tag> tags = new HashSet<Tag>();
   
   public Arrangement() {}


   @ManyToMany(targetEntity=Tag.class, cascade={CascadeType.PERSIST, CascadeType.MERGE}, fetch=FetchType.LAZY)
   @JoinTable(name="ARRANGEMENT_TAG", joinColumns={@JoinColumn(name="arrangement_id")}, inverseJoinColumns={@JoinColumn(name="arrangementtag_id")})
        @IndexedEmbedded
        @Boost(2.5f)
   public Set<Tag> getTags() {
      return tags;
   }

   public void setTags(Set<Tag> arrangementtags) {
      this.tags = arrangementtags;
   }
   

   @OneToMany(cascade=CascadeType.ALL,  fetch=FetchType.EAGER)
   @OrderBy("language ASC")
        @JoinTable(name="ARRANGEMENT_SUMMARY", joinColumns={@JoinColumn(name="arrangement_id")}, inverseJoinColumns={@JoinColumn(name="text_id")})
        @Boost(1.3f)
        @Field(
      name="summary",
      index=Index.TOKENIZED,
      store=Store.YES,
      bridge = @FieldBridge(impl=I18FieldBridge.class,
           params = @Parameter(name="prefix", value="summary")))
   public Set<Text> getSummarytexts() {
      return summarytexts;
   }
   
   public void setSummarytexts(Set<Text> summarytexts) {
      this.summarytexts = summarytexts;
   }
}


A fieldbridge
Code:
public class I18FieldBridge implements FieldBridge, ParameterizedBridge {


   public void set(String name, Object value, Document document, LuceneOptions luceneOptions) {
      
      Set<Text> texts = (Set<Text>) value;
      
      for (Text text : texts) {
      
         if (text == null) {
            return;
         }
         
          Field field = new Field(
                prefix + "_" + text.getLanguage(),
                text.getWord(),
                  luceneOptions.getStore(),
                  luceneOptions.getIndex(),
                  luceneOptions.getTermVector());
          Float boost = luceneOptions.getBoost();
          field.setBoost(boost);
          document.add(field);
      }
   }

   
   private String prefix;

    public void setParameterValues(Map parameters) {
        this.prefix = (String) parameters.get("prefix");
    }

}



The Text class.
Code:
@Table(name="TEXT")
@AnalyzerDefs({
   @AnalyzerDef(name = "SWE",
      tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = {
      @TokenFilterDef(factory = LowerCaseFilterFactory.class),
      @TokenFilterDef(factory = StopFilterFactory.class, params = {
         @Parameter(name = "words", value = "stopwords_swe.properties"),
         @Parameter(name = "ignoreCase", value = "true") }),
      @TokenFilterDef(factory = SnowballPorterFilterFactory.class, params = {
            @Parameter(name = "language", value = "Swedish") })
   }),
   @AnalyzerDef(name = "ENG",
         tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = {
         @TokenFilterDef(factory = ISOLatin1AccentFilterFactory.class),
         @TokenFilterDef(factory = LowerCaseFilterFactory.class),
         @TokenFilterDef(factory = StopFilterFactory.class, params = {
            @Parameter(name = "words", value = "stopwords_eng.properties"),
            @Parameter(name = "ignoreCase", value = "true") })
         @TokenFilterDef(factory = SnowballPorterFilterFactory.class, params = {
            @Parameter(name = "language", value = "English") })
      }),
   @AnalyzerDef(name = "onsearchAnalyzerSWE",
            tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = {
            @TokenFilterDef(factory = LowerCaseFilterFactory.class),
            @TokenFilterDef(factory = SynonymFactory.class, params = {
               @Parameter(name = "ignoreCase", value = "true"),
                    @Parameter(name = "expand", value = "true"),
                    @Parameter(name = "synonyms", value = "synonyms_swe.properties")}),
            
            @TokenFilterDef(factory = SnowballPorterFilterFactory.class, params = {
               @Parameter(name = "language", value = "Swedish") })
         }),
})
@AnalyzerDiscriminator(impl = LanguageDiscriminator.class)
public class Text extends Base {

   private String word;
   private Language language;


   @Enumerated(EnumType.STRING)
   public Language getLanguage() {
      return language;
   }
   public void setLanguage(Language language) {
      this.language = language;
   }


   @Column(nullable=true)
   public String getWord() {
      return word;
   }
   public void setWord(String word) {
      this.word = word;
   }
}


The LanguageDiscriminator
Code:
public class LanguageDiscriminator implements Discriminator {

    public String getAnanyzerDefinitionName(Object value, Object entity, String field) {
          return ((Text) entity).getLanguage().name();
    }
}



So, simplyfied it a bit, but for example a Arrangement has 2 Text objects in it. One with a english summary and one with a swedish.
The fieldbridge sees to that the Text object is indexed like: summary_SWE or summary_ENG, and when I search on something I specify what summary that should be searched depending on the language of the searchword.
The language descriminator is used to put different analyzers on the different Text objects depending on the language in the language variable.
So far so good. I would like to know if I am using the correct approach here, but my problem is>

It seems that the SnowballPorterFilterFactory isnt applied on indexing, it is only applied when I search, and use the "onsearchAnalyzerSWE". It seems to use the other analyzers in the AnalyzerDef SWE and ENG, but not the snowball.

Can anyone explain why? Or what I should check?

Thankfull for any help, I have been trying quite long..


Top
 Profile  
 
 Post subject: Re: Hibernate search multiple languages
PostPosted: Mon Nov 16, 2009 9:00 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Hi,

looks ok at first sight. Why do you think that the snowball analyzers are not applied? Have you looked at the generated index with Luke?
Is there anything in the log file?
Maybe add some debug trace to you discriminator. Is "((Text) entity).getLanguage().name();" really returning "ENG" or "SWE"?
It might look like "the other analyzers get applied" since if null is returned the StandardAnalyzer is applied. Unless, you are saying that you can show that stopwords_swe.properties and stopwords_eng.properties get applied.

--Hardy


Top
 Profile  
 
 Post subject: Re: Hibernate search multiple languages
PostPosted: Mon Nov 16, 2009 9:11 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi sore,
the problem is that by using a FieldBridge you override any annotations on the type, so you have now the responsability to provide the mapping to the index and consequently the AnalyzerDiscriminator is ignored.
You should select the analyzer on
Code:
Set<Text> getSummarytexts()
.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Hibernate search multiple languages
PostPosted: Mon Nov 16, 2009 9:11 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
oops hi Hardy, sorry we were writing at the same time.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Hibernate search multiple languages
PostPosted: Mon Nov 16, 2009 9:20 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Seems though you had the better answer :)


Top
 Profile  
 
 Post subject: Re: Hibernate search multiple languages
PostPosted: Mon Nov 16, 2009 11:08 am 
Newbie

Joined: Sun Nov 15, 2009 8:22 pm
Posts: 19
Thank you!
Nice to be getting somewhere. If you could just give me some more pointers :)
Should I use the analyzerdescriminator on getSummaries?

Code:
@AnalyzerDiscriminator(impl = LanguageDiscriminator.class)
public Set<Text> getSummarytexts() {
   return summarytexts;
}


I dont know how to solve that then, because I then get a Set<Text> into the LanguageDiscriminator, and cant "discriminate" what Analyzer to use because the set has Text objects with different languages.

Or should I somehow set the analyzer in the fieldbridge? If so, how could I do that?


Top
 Profile  
 
 Post subject: Re: Hibernate search multiple languages
PostPosted: Mon Nov 16, 2009 2:19 pm 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Have you tried to use @IndexedEmbedded on "Set<Text> getSummarytexts" and adding a ClassBridge to the class Text?


Top
 Profile  
 
 Post subject: Re: Hibernate search multiple languages
PostPosted: Mon Nov 16, 2009 5:19 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
As Hardy said you need a ClassBridge to dynamically set your field names, but you also need to map your fields to analyzers.
I needed something like this, not very clean code but you can reuse it on many entities (or even configure it as your global Analyzer) if you follow a convention to label your fields per language:
Code:
public class LocalizedAnalyzer extends Analyzer {
   
   private final Analyzer italianAnalyzer = ...
   private final Analyzer englishAnalyzer = ...
   ...
   private final Analyzer globalAnalyzer = ...

   @Override
   public TokenStream tokenStream(String fieldName, Reader reader) {
      if (fieldName.endsWith( "_IT" )) return italianAnalyzer.tokenStream(fieldName, reader);
      if (fieldName.endsWith( "_UK" )) return englishAnalyzer.tokenStream(fieldName, reader);
      ...
      return globalAnalyzer.tokenStream(fieldName, reader);
   }

}

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Hibernate search multiple languages
PostPosted: Tue Nov 17, 2009 11:48 am 
Newbie

Joined: Sun Nov 15, 2009 8:22 pm
Posts: 19
Thanks again for your help.
I seem to be getting closer to a solution, but still some trouble...

I have a made a analyzer like you suggested s.grinovero and a classbridge, they are both working seperatly, but if I use the classbridge the analyzer never gets called. This is my setup now.

The Arrangement class
Code:
class Arrangement {

...

@IndexedEmbedded
public Set<Text> getSummarytexts() {
      return summarytexts;
}

...

}

The Text class
Code:
@Analyzer(impl = LocalizedAnalyzer.class)
@ClassBridge(name="textbridge", index=Index.UN_TOKENIZED, impl=TextBridge.class )
class Text {

...

@Column(nullable=true)
public String getWord() {
   return word;
}

...

}


In this setup the fields get the correct names i.e. SUMMARY_SWE or SUMMARY_ENG, but the analyzer is never called -checked this with breakpoint and with Luke.
It doesnt make a difference if I put the analyzer on getSummarytexts().

Feels like Im close at least :)


Top
 Profile  
 
 Post subject: Re: Hibernate search multiple languages
PostPosted: Tue Nov 17, 2009 12:04 pm 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
hi,

have you tried to set the analyzer parameter in the @ClassBridge annotation? Bu just setting it in the entity you are setting the default analyzer in case @Field is used. In your case you want to set the default analyzer for the ClassBridge.

--Hardy


Top
 Profile  
 
 Post subject: Re: Hibernate search multiple languages
PostPosted: Tue Nov 17, 2009 12:57 pm 
Newbie

Joined: Sun Nov 15, 2009 8:22 pm
Posts: 19
Thanks Hardy,
do you mean like this:

Code:
@ClassBridge(name="textbridge",
      index=Index.UN_TOKENIZED,
      impl=TextBridge.class,
      analyzer= @Analyzer(impl = LocalizedAnalyzer.class))
public class Text extends Base {


   private String word;
...

   @Column(nullable=true)
   public String getWord() {
      return word;
   }
...
}


The Arrangement has no changes.

that doesnt work :(
It doesnt use the analyzer anyway. Any ideas??
There is no @ManyToOne mapping of Arrangement in the Text object (so the Text class doesnt know in what Arrangement collection it is), could that be a problem??


Top
 Profile  
 
 Post subject: Re: Hibernate search multiple languages
PostPosted: Tue Nov 17, 2009 1:06 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
I was not expecting this, and I'm surprised as I did something similar. Version used?
Could you open an issue and attach a testcase? Much easier to inspect :-)

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Hibernate search multiple languages
PostPosted: Wed Nov 18, 2009 5:52 am 
Newbie

Joined: Sun Nov 15, 2009 8:22 pm
Posts: 19
I have tried both hibernate core 3.3.2 GA and 3.2.6 GA (and followed its versionrecommendations in the compatibility matrix).
I will try debugging it some more and if I dont find the error I will try to find some time to make a testcase.


Top
 Profile  
 
 Post subject: Re: Hibernate search multiple languages
PostPosted: Wed Nov 18, 2009 6:58 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
And what about the Hibernate Search version?

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Hibernate search multiple languages
PostPosted: Wed Nov 18, 2009 7:31 am 
Newbie

Joined: Sun Nov 15, 2009 8:22 pm
Posts: 19
I followed this
https://www.hibernate.org/30.html#A3

so I have tried
3.0.1 GA with core 3.2.6 GA
3.1.1 GA with core 3.3.2 GA


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 18 posts ]  Go to page 1, 2  Next

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.