Hello
What is the best way to replace umlauts (vocals with two dots on top) by ae, oe or ue?
e.g.: Mueller instead of Müller.
Registering MappingCharFilterFactory from solr 1.4 as a @TokenFilterDef obviously raises a syntax error.
(Is solr 1.4 compatible with HibernateSearch 3.1.1.GA after all?)
A FieldBridge to define the replacement of single characters for an index seems to do the job:
Code:
@Column
@Field(index = Index.TOKENIZED)
@FieldBridge( impl=UmlautBridge.class)
@Analyzer(definition = "customanalyzer")
private String name;
...
public class UmlautBridge implements StringBridge {
@Override
public String objectToString(Object arg0) {
if(arg0 == null) return null;
if(!(arg0 instanceof String)) throw new IllegalArgumentException();
return ((String) arg0)
.replaceAll("ä", "ae")
.replaceAll("ö", "oe")
.replaceAll("ü", "ue");
}
}
But I'd rather apply the character mapping from whthin a StandardTokenizerFactory that beforehand.
A similar concern for the query generation:
Code:
UmlautBridge u = new UmlautBridge();
String umlautless = u.objectToString("Müller");
query = parser.parse(String.format("name:%s*", umlautless));
Any chance to apply the field bridge other than programatically?
Thanks,
Christian