Hi everyone,
I'm using Hibernate Search 3.4 and I would like to exclude numbers from my index: is there a way to do it using standard filters? I've checked the documentation (and the various Filter in org.apache.solr.analysis) but found nothing so I started developing a custom Filter: here is the code
Code:
import java.io.IOException;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.lucene.analysis.FilteringTokenFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
public class NumbersRemoverFilter extends FilteringTokenFilter {
private final CharTermAttribute termAtt = addAttribute(CharTermAttribute.class);
private static Log log = LogFactory.getLog(NumbersRemoverFilter.class.getName());
public NumbersRemoverFilter(boolean enablePositionIncrements, TokenStream input) {
super(enablePositionIncrements, input);
}
@Override
protected boolean accept() throws IOException {
for (int i = 0; i < termAtt.buffer().length; i++) {
if (Character.isLetter(termAtt.charAt(i))) {
log.debug("accept true " + new String(termAtt.buffer()) + " length = " + termAtt.buffer().length);
return true;
}
}
log.debug("accept false " + new String(termAtt.buffer()) + " length = " + termAtt.buffer().length);
return false;
}
}
and this is the factory
Code:
import org.apache.lucene.analysis.TokenStream;
public class NumbersRemoverFilterFactory extends org.apache.solr.analysis.BaseTokenFilterFactory {
public TokenStream create(TokenStream input) {
return new NumbersRemoverFilter(true, input);
}
}
I've used the filter in my analyzer in this way
Code:
...
@AnalyzerDef(name = "customanalyzer",
tokenizer =
@TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = StandardFilterFactory.class),
@TokenFilterDef(factory = NumbersRemoverFilterFactory.class),
@TokenFilterDef(factory = LengthFilterFactory.class, params = {
@Parameter(name="min", value="4" ),
@Parameter(name="max", value="20" )
}),
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = ISOLatin1AccentFilterFactory.class),
@TokenFilterDef(factory = StopFilterFactory.class, params = {
@Parameter(name="words", value= "stoplist.properties" )
})
})
@Analyzer(definition = "customanalyzer")
...
With my custom filter searches now don't return any result :-(
Is there something wrong with my filter?
There's also a thing I don't understand: in the filter I log each token and I expect to see every token of my document logged, but that doesn't happen and I see only some token logged. Am I missing something?
Thanks in advance,
Andrea