-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 5 posts ] 
Author Message
 Post subject: filter special chars
PostPosted: Thu Aug 13, 2009 9:36 am 
Newbie

Joined: Tue Jul 14, 2009 6:13 am
Posts: 12
hi, is there any way to filter special chars like this one: "Ø" out of my fields I want to index?
until now I tested it with the StopFilter("Ø"). But it doesn't work. When I search the index with Luke for this symbol I still get results.

Can anyone help please?


Top
 Profile  
 
 Post subject: Re: filter special chars
PostPosted: Thu Aug 13, 2009 12:33 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi,
a StopFilter will only remove tokens which have a complete match to what you want to stop, not partial.
So having a stopfilter configured with, for example, the letter "A" will not remove the words containing the char "A", if there are also more characters.
So the removal of your symbol will only remove it when it's alone, is that what you need?
If you want to convert all special accents (like "èé" to "ee") there are some filters available in the Lucene and SolR distributions, if you want to remove chars (like "WØhatØ is thiØs strØnge chØr?" to "What is this strnge chr?" you'll need to write a custom TokenFilter.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: filter special chars
PostPosted: Fri Aug 14, 2009 5:28 am 
Newbie

Joined: Tue Jul 14, 2009 6:13 am
Posts: 12
s.grinovero wrote:
Hi,
a StopFilter will only remove tokens which have a complete match to what you want to stop, not partial.
So having a stopfilter configured with, for example, the letter "A" will not remove the words containing the char "A", if there are also more characters.

at the moment I need to solve this problem. there are standalone chars I want to remove like this special char I mentioned. But putting this symbol in the stop list doesn't remove it from the index. Contrary putting any letters in the stop list works. Perhaps I have an encoding problem? But where do I have to start for solving it?

s.grinovero wrote:
So the removal of your symbol will only remove it when it's alone, is that what you need?
If you want to convert all special accents (like "èé" to "ee") there are some filters available in the Lucene and SolR distributions, if you want to remove chars (like "WØhatØ is thiØs strØnge chØr?" to "What is this strnge chr?" you'll need to write a custom TokenFilter.

do you have any howto where I can read how to write my own TokenFilter?

thanks :)


Top
 Profile  
 
 Post subject: Re: filter special chars
PostPosted: Fri Aug 14, 2009 12:47 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
yes you might have an encoding problem, or your analyzer is converting the bad char in something different, so it doesn't match your blacklist and is not correctly discarded.

To correctly build a TokenFilter I suggest you look into the Lucene and SolR sources, they have lots of examples and you might find something like you need in the sandbox or contrib directories of these projects.

If that doesn't help, I've to suggest you to ask into the Lucene forums as they might know a better solution than me :)

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: filter special chars
PostPosted: Mon Aug 17, 2009 4:59 am 
Newbie

Joined: Tue Jul 14, 2009 6:13 am
Posts: 12
the answer is:
i have to put the lower-cased variante of this symbol (ø instead of Ø) into my stop list as the lowercasefilter is acting right before the stopfilter :) now it works


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 5 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.