-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 2 posts ] 
Author Message
 Post subject: Hibernate Search and cyrillic characters
PostPosted: Thu Aug 23, 2012 5:48 am 
Newbie

Joined: Thu Aug 23, 2012 5:41 am
Posts: 1
Hello! I just began to use Hibernate search and it worlk for me but weird )))

i have next domain object:

Code:
@Table(name="ADVERT")
@Entity(name="Advert")
@Indexed
@AnalyzerDef(name = "customanalyzer",
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = SnowballPorterFilterFactory.class, params = {
@Parameter(name = "language", value = "Russian"), @Parameter(name="protected", value="test.txt")
})
},
charFilters ={@CharFilterDef(factory = HTMLStripCharFilterFactory.class),})
public class JpaAdvert implements Advert{
   
   @Id
   @GeneratedValue
   private long id;
   
   
   @ManyToOne(fetch=FetchType.EAGER)
   @JoinColumn(name="category_id", referencedColumnName="id")
   private JpaCategory category = new JpaCategory();
   
   @Field
   @Analyzer(definition="customanalyzer")
   private String title="";
   
   @Field
   @Analyzer(definition="customanalyzer")
   private String description="";


and i want to make a search of adverts which are similar by advert title. I do like:

Code:
FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(getJpaTemplate().getEntityManagerFactory().createEntityManager());
      QueryBuilder qb = fullTextEntityManager.getSearchFactory()
            .buildQueryBuilder().forEntity(JpaAdvert.class).get();
            org.apache.lucene.search.Query query = qb.keyword().onFields("title")
            .matching(((SimilarAdvertFilter)filter).getAdvert().getTitle())
            .createQuery();
            // wrap Lucene query in a javax.persistence.Query
//            javax.persistence.Query persistenceQuery =
            Query persistenceQuery =
            fullTextEntityManager.createFullTextQuery(query, JpaAdvert.class);
            setRange(persistenceQuery, 0, 8);
            // execute search
            return persistenceQuery.getResultList();


and i receved the list of adverts! but most of them are not relevant to what i need....

For Example the title of advert is "Sell the best phone" and i get the advert with title like "The best animal", so the word "the" exist in the both titles, but the titles are very far from each other!

May be exist way to sort by some wight or scores to get first advert which are very relative!

Please help!


Top
 Profile  
 
 Post subject: Re: Hibernate Search and cyrillic characters
PostPosted: Fri Aug 24, 2012 9:51 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Hi,

one thing I see in your setup is that there is no stop word filter in your setup. You need a stop word filter for Russian. Have a look at the Lucene code of RussianAnalyzer to see how the analyzer is built internally - http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/ru/RussianAnalyzer.java

You probably want to replicate this.

--Hardy


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 2 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.