Hello all!
I am currently working on i18n of certain properties of our JPA entities. I wanted to use @AnalyzerDiscriminator on a 'language' field of my localized entities as follows (I only include the important part of the code):
Code:
@Entity
@Indexed
@AnalyzerDefs({
@AnalyzerDef(name = "en", tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = { @TokenFilterDef(params = { @Parameter(value = "language", name = "English") }, factory = SnowballPorterFilterFactory.class) }),
@AnalyzerDef(name = "fr", tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = { @TokenFilterDef(params = { @Parameter(value = "language", name = "French") }, factory = SnowballPorterFilterFactory.class) }) })
@Analyzer(definition = "fr")
public class IndexedEntity {
@Id
@GeneratedValue(strategy = GenerationType.AUTO)
private long id;
@OneToMany(fetch = FetchType.EAGER, cascade = CascadeType.ALL)
@MapKey(name = "language")
@IndexedEmbedded
private Map<String, IndexedEntityI18N> i18n = new HashMap<String, IndexedEntityI18N>();
[...]
}
@Entity
public class IndexedEntityI18N {
@Id
@GeneratedValue(strategy = GenerationType.AUTO)
private long id;
private String language;
@Field
private String name;
@AnalyzerDiscriminator(impl = I18NDiscriminator.class)
public String getLanguage() {
return language;
}
[...]
}
public class I18NDiscriminator implements Discriminator {
@Override
public String getAnalyzerDefinitionName(Object value, Object entity,
String field) {
return (String) value;
}
}
The idea is to index the same field several times using a map of localized objects. The analyzer discriminator was meant to point to the appropriate language analyzer via 'language' property (i.e. if language is set to 'en' we use 'en' analyzer etc.). I was really happy with this design :) Unfortunately it doesn't work like this. After some debugging I found out that the analyzer is determined only once for all collection elements (as they have all the same index keys).
The solution is then to 'localize' also the index key. This way it will be unique for each collection element and the analyzer will be determined separately for each one of them. For example using a class bridge like this (I know it is not generic but it's just an example):
Code:
public class I18NBridge implements FieldBridge {
@Override
public void set(String name, Object value, Document document,
LuceneOptions options) {
IndexedEntityI18N i18nInfos = (IndexedEntityI18N) value;
options.addFieldToDocument("i18n.name." + i18nInfos.getLanguage(),
i18nInfos.getName(), document);
}
}
}
Now we will have separate 'i18n.name.fr' and 'i18n.name.en' index keys indexed with correct analyzers.
The only problem is that this approach complexifies search queries if we want to perform search on all language versions at once.
The questions is - is this behavior on collections a bug or is it voluntary (and thus have a logical explanation)?
Thanks in advance for your opinions!
Best regards,
Michal