-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 
Author Message
 Post subject: Replace Umlauts as ae, oe, ue?
PostPosted: Fri Feb 05, 2010 3:46 am 
Newbie

Joined: Thu Feb 04, 2010 3:10 am
Posts: 2
Hello

What is the best way to replace umlauts (vocals with two dots on top) by ae, oe or ue?
e.g.: Mueller instead of Müller.


Registering MappingCharFilterFactory from solr 1.4 as a @TokenFilterDef obviously raises a syntax error.
(Is solr 1.4 compatible with HibernateSearch 3.1.1.GA after all?)

A FieldBridge to define the replacement of single characters for an index seems to do the job:
Code:

   @Column
   @Field(index = Index.TOKENIZED)
   @FieldBridge( impl=UmlautBridge.class)
   @Analyzer(definition = "customanalyzer")
   private String name;

...

public class UmlautBridge implements StringBridge {

   @Override
   public String objectToString(Object arg0) {
      if(arg0 == null) return null;
      
      if(!(arg0 instanceof String)) throw new IllegalArgumentException();
      
      return ((String) arg0)
        .replaceAll("ä", "ae")
        .replaceAll("ö", "oe")
        .replaceAll("ü", "ue");
   }

}

But I'd rather apply the character mapping from whthin a StandardTokenizerFactory that beforehand.

A similar concern for the query generation:
Code:
UmlautBridge u = new UmlautBridge();
String umlautless = u.objectToString("Müller");
query = parser.parse(String.format("name:%s*", umlautless));

Any chance to apply the field bridge other than programatically?

Thanks,
Christian


Top
 Profile  
 
 Post subject: Re: Replace Umlauts as ae, oe, ue?
PostPosted: Fri Feb 05, 2010 11:25 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Search 3.1.1.GA depends on Solr 1.3. We still have to upgrade to a later version.
Have you tried adding a copy of MappingCharFilterFactory to your project (see http://svn.apache.org/viewvc/lucene/sol ... iew=markup)? I am not sure whether you end up pulling even more classes in, but it might be an alternative. Once Search upgrades to a new version of Solr you can get rid of these added classes.

The problem with your field bridge approach is that it will only apply at indexing time. The umlaut replacement really has to happen in the analyzer and this analyzer has to be used for indexing as well as searching. If you don't want to copy or add a modified version of MappingCharFilterFactory you could always add the umlaut replacement into your custom analyzer.

--Hardy


Top
 Profile  
 
 Post subject: Re: Replace Umlauts as ae, oe, ue?
PostPosted: Fri Feb 05, 2010 11:47 am 
Newbie

Joined: Thu Feb 04, 2010 3:10 am
Posts: 2
Thank you for the link, Hardy,

But I'm afraid it takes more than adding this class, because
Code:
MappingCharFilterFactory extends BaseCharFilterFactory

and Hibernate Search expects a filter based on BaseTokenFilterFactory.

- Christian


Top
 Profile  
 
 Post subject: Re: Replace Umlauts as ae, oe, ue?
PostPosted: Sat Feb 06, 2010 6:56 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
would you like to look into this? trunk was recently upgraded to support Lucene 2.9, some help to upgrade Solr too would be welcome.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.