Hibernate Community • View topic - Hibernate search analyzers

View unanswered posts | View active topics

Board index » Projects » Search, Validator, Shards

All times are UTC - 5 hours [ DST ]

Hibernate search analyzers

Page 1 of 1

[ 2 posts ]

Previous topic | Next topic

Author

Message

malejpavouk

Post subject: Hibernate search analyzers

Posted: Thu Feb 04, 2010 7:02 am

Newbie

Joined: Sun Dec 28, 2008 2:58 pm
Posts: 5

Hi,

I am new to Hibernate search and I have some pretty stupid question :-)...I have this object

Code:

@Indexed
@AnalyzerDef(name = "customAnalyzer",
tokenizer = @TokenizerDef(factory =
HTMLStripStandardTokenizerFactory.class),
filters = {
    @TokenFilterDef(factory = ISOLatin1AccentFilterFactory.class),
    @TokenFilterDef(factory = LowerCaseFilterFactory.class)
})
public class ArticleDetails extends Article {
    private static final long serialVersionUID = 1L;
    @Field(index = Index.TOKENIZED, boost = @Boost(5f))
    @Analyzer(definition="customAnalyzer")
    private String heading;
    @Field(index = Index.TOKENIZED)
    @Analyzer(definition="customAnalyzer")
    private String text;
    private String articleUntransformed;
    private Set<Attachment> attachments;
.
.
.

and what I want to do is: remove diacritics...(my articles are in Czech language, so we have letters like ěščřžýáíéůú...) than I want to strip HTML tags because the articles are in HTML (<p>I am article</p>)...and I want everyting in lowercase...

But if I look to the index with Luke project, i can find there in top ranking terms words like h2, p, div, využívá (in czech uses) etc...and I think that with this mapping it should not be there...

Another problem is, when i search for word "sifra" (in czech is "šifra" cipher...so the only difference is in diacritics) the articles which are using the word "šifra" are not found.

This is the query code.

Code:

    public List<ArticleDetails> find(String s){
        FullTextSession fullTextSession =
                Search.getFullTextSession(factory.getCurrentSession());
        Transaction tx = fullTextSession.beginTransaction();
        // create native Lucene query
        String[] fields = new String[]{"heading", "text"};
        MultiFieldQueryParser parser = new MultiFieldQueryParser(fields, fullTextSession.getSearchFactory().getAnalyzer("customAnalyzer"));
        org.apache.lucene.search.Query query;
        try {
            query = parser.parse(s);
        } catch (ParseException ex) {
            Logger.getLogger(ArticleDetailsHibernateDAO.class.getName()).log(Level.SEVERE, null, ex);
            throw new IllegalArgumentException(ex);
        }
        // wrap Lucene query in a org.hibernate.Query
        org.hibernate.Query hibQuery =
                fullTextSession.createFullTextQuery(query, ArticleDetails.class);
        List result = hibQuery.list();
        return result;
    }

Thanks.

Pavel

Top

malejpavouk

Post subject: Re: Hibernate search analyzers

Posted: Thu Feb 04, 2010 7:28 am