-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 2 posts ] 
Author Message
 Post subject: Hibernate search analyzers
PostPosted: Thu Feb 04, 2010 7:02 am 
Newbie

Joined: Sun Dec 28, 2008 2:58 pm
Posts: 5
Hi,

I am new to Hibernate search and I have some pretty stupid question :-)...I have this object

Code:
@Indexed
@AnalyzerDef(name = "customAnalyzer",
tokenizer = @TokenizerDef(factory =
HTMLStripStandardTokenizerFactory.class),
filters = {
    @TokenFilterDef(factory = ISOLatin1AccentFilterFactory.class),
    @TokenFilterDef(factory = LowerCaseFilterFactory.class)
})
public class ArticleDetails extends Article {
    private static final long serialVersionUID = 1L;
    @Field(index = Index.TOKENIZED, boost = @Boost(5f))
    @Analyzer(definition="customAnalyzer")
    private String heading;
    @Field(index = Index.TOKENIZED)
    @Analyzer(definition="customAnalyzer")
    private String text;
    private String articleUntransformed;
    private Set<Attachment> attachments;
.
.
.


and what I want to do is: remove diacritics...(my articles are in Czech language, so we have letters like ěščřžýáíéůú...) than I want to strip HTML tags because the articles are in HTML (<p>I am article</p>)...and I want everyting in lowercase...

But if I look to the index with Luke project, i can find there in top ranking terms words like h2, p, div, využívá (in czech uses) etc...and I think that with this mapping it should not be there...

Another problem is, when i search for word "sifra" (in czech is "šifra" cipher...so the only difference is in diacritics) the articles which are using the word "šifra" are not found.

This is the query code.
Code:
    public List<ArticleDetails> find(String s){
        FullTextSession fullTextSession =
                Search.getFullTextSession(factory.getCurrentSession());
        Transaction tx = fullTextSession.beginTransaction();
        // create native Lucene query
        String[] fields = new String[]{"heading", "text"};
        MultiFieldQueryParser parser = new MultiFieldQueryParser(fields, fullTextSession.getSearchFactory().getAnalyzer("customAnalyzer"));
        org.apache.lucene.search.Query query;
        try {
            query = parser.parse(s);
        } catch (ParseException ex) {
            Logger.getLogger(ArticleDetailsHibernateDAO.class.getName()).log(Level.SEVERE, null, ex);
            throw new IllegalArgumentException(ex);
        }
        // wrap Lucene query in a org.hibernate.Query
        org.hibernate.Query hibQuery =
                fullTextSession.createFullTextQuery(query, ArticleDetails.class);
        List result = hibQuery.list();
        return result;
    }


Thanks.

Pavel


Top
 Profile  
 
 Post subject: Re: Hibernate search analyzers
PostPosted: Thu Feb 04, 2010 7:28 am 
Newbie

Joined: Sun Dec 28, 2008 2:58 pm
Posts: 5
malejpavouk wrote:
...


Ups...this one was my mistake...I didnt realize, that in Latin1 are only some czech letters...so I have written my own DiacriticsFilter and everything works fine...sorry for spaming the forum with stupid questions :-)

Pavel


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 2 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.