behavior of @AnalyzerDiscriminator on collections

misieq · **Joined:** Tue Jan 18, 2011 9:01 am **Posts:** 4

Hello all!

I am currently working on i18n of certain properties of our JPA entities. I wanted to use @AnalyzerDiscriminator on a 'language' field of my localized entities as follows (I only include the important part of the code):

Code:

@Entity
@Indexed
@AnalyzerDefs({
      @AnalyzerDef(name = "en", tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = { @TokenFilterDef(params = { @Parameter(value = "language", name = "English") }, factory = SnowballPorterFilterFactory.class) }),
      @AnalyzerDef(name = "fr", tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = { @TokenFilterDef(params = { @Parameter(value = "language", name = "French") }, factory = SnowballPorterFilterFactory.class) }) })
@Analyzer(definition = "fr")
public class IndexedEntity {

   @Id
   @GeneratedValue(strategy = GenerationType.AUTO)
   private long id;

   @OneToMany(fetch = FetchType.EAGER, cascade = CascadeType.ALL)
   @MapKey(name = "language")
   @IndexedEmbedded
   private Map<String, IndexedEntityI18N> i18n = new HashMap<String, IndexedEntityI18N>();

[...]
}

@Entity
public class IndexedEntityI18N {

   @Id
   @GeneratedValue(strategy = GenerationType.AUTO)
   private long id;

   private String language;

   @Field
   private String name;

   @AnalyzerDiscriminator(impl = I18NDiscriminator.class)
   public String getLanguage() {
      return language;
   }
[...]
}

public class I18NDiscriminator implements Discriminator {

   @Override
   public String getAnalyzerDefinitionName(Object value, Object entity,
         String field) {
      return (String) value;
   }

}

The idea is to index the same field several times using a map of localized objects. The analyzer discriminator was meant to point to the appropriate language analyzer via 'language' property (i.e. if language is set to 'en' we use 'en' analyzer etc.). I was really happy with this design :) Unfortunately it doesn't work like this. After some debugging I found out that the analyzer is determined only once for all collection elements (as they have all the same index keys).

The solution is then to 'localize' also the index key. This way it will be unique for each collection element and the analyzer will be determined separately for each one of them. For example using a class bridge like this (I know it is not generic but it's just an example):

Code:

public class I18NBridge implements FieldBridge {

   @Override
   public void set(String name, Object value, Document document,
         LuceneOptions options) {

         IndexedEntityI18N i18nInfos = (IndexedEntityI18N) value;
         options.addFieldToDocument("i18n.name." + i18nInfos.getLanguage(),
               i18nInfos.getName(), document);

      }

   }

}

Now we will have separate 'i18n.name.fr' and 'i18n.name.en' index keys indexed with correct analyzers.
The only problem is that this approach complexifies search queries if we want to perform search on all language versions at once.

The questions is - is this behavior on collections a bug or is it voluntary (and thus have a logical explanation)?

Thanks in advance for your opinions!

Best regards,
Michal

sanne.grinovero · **Posted:** Wed May 18, 2011 9:34 am

The problem is mainly that you don't want to mix different analyzers on the same field: when you perform a query, you want to analyze the query text from the user using the same analyser of the field you're matching.
If you where to apply different analysers (per language) on the same field, they would all be stored in the same field and matching it properly would be a mess.. likely not the proper results you'd expect; maybe it looks like it works but hardly predictable.

I agree the code is a bit verbose, if you have suggestions for improvements they're welcome.