s.grinovero wrote:
Hi,
a StopFilter will only remove tokens which have a complete match to what you want to stop, not partial.
So having a stopfilter configured with, for example, the letter "A" will not remove the words containing the char "A", if there are also more characters.
at the moment I need to solve this problem. there are standalone chars I want to remove like this special char I mentioned. But putting this symbol in the stop list doesn't remove it from the index. Contrary putting any letters in the stop list works. Perhaps I have an encoding problem? But where do I have to start for solving it?
s.grinovero wrote:
So the removal of your symbol will only remove it when it's alone, is that what you need?
If you want to convert all special accents (like "èé" to "ee") there are some filters available in the Lucene and SolR distributions, if you want to remove chars (like "WØhatØ is thiØs strØnge chØr?" to "What is this strnge chr?" you'll need to write a custom TokenFilter.
do you have any howto where I can read how to write my own TokenFilter?
thanks :)