-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 8 posts ] 
Author Message
 Post subject: language analyser class and annotations
PostPosted: Wed May 11, 2011 10:23 am 
Beginner
Beginner

Joined: Mon Apr 04, 2011 12:08 pm
Posts: 32
I defined a filter this way

Code:
@Entity
@Table(name = "Entity")
@Indexed
@AnalyzerDef(name = "entityAnalyser",
      tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
      filters = { @TokenFilterDef(factory = StandardFilterFactory.class),   
               @TokenFilterDef(factory = LowerCaseFilterFactory.class),
               @TokenFilterDef(factory = StopFilterFactory.class),
               @TokenFilterDef(factory = SnowballPorterFilterFactory.class, params = { @Parameter(name = "language", value = "French")}),
               @TokenFilterDef(factory = PhoneticFilterFactory.class ,params = { @Parameter(name="encoder", value="DoubleMetaphone")}),
               @TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
               @TokenFilterDef(factory = NGramFilterFactory.class,   params = { @Parameter(name = "minGramSize", value = "3"),
                  @Parameter(name = "maxGramSize", value = "3")   })
               }
      ,charFilters = { @CharFilterDef(factory = HTMLStripCharFilterFactory.class) }
)
@Analyzer(definition="entityAnalyser")
public class Entity implements java.io.Serializable {


I would like to use the FrenchAnalyser.

Is there a way to enhance the FrenchAnalyser by annotations, or do I have to recreate the FrenchAnalyser with annotations like above, or extend the FrenchAnalyser class, to include java code that does what the above anotations are doing ??


Top
 Profile  
 
 Post subject: Re: language analyser class and annotations
PostPosted: Wed May 11, 2011 11:32 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi,
sorry I don't the analyzer you have defined makes much sense. it will first split words in 3-grams, then ascii-fold them- then transform these 3-grams in such a way in how they would be pronounced in english - if they where words at all! then these pronounciation traslitterations are analyzed using french grammar rules??
On this output you're removing words which have to match your stopfilter.. very unlikely you are removing what you expect.

We have a tool in the testsuire source code; have a look at
org.hibernate.search.test.util.AnalyzerUtils

it's able to inspect how your analyzer output is going to look like.

To use the FrenchAnalyzer, define a different analyzer containing only the token filters you need. you can define several analyzers and apply them on different entities, different properties, or even index the same property multiple times to different fields using different analyzers. Make sure during search you match the same analyzer on each field that you used for indexing - if you use the QueryBuilder API ("DSL" in the reference manual) it will apply the proper analyzer figuring out the same from indexing time.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: language analyser class and annotations
PostPosted: Wed May 11, 2011 12:00 pm 
Beginner
Beginner

Joined: Mon Apr 04, 2011 12:08 pm
Posts: 32
Yes I am aware my filters are not ok, since the results, though I appreciate any help on this, it would probably have been my next question.

I know Hibernate Search is basically a wrapper on Lucene, however maybe there should be some more exemples in the docs, at least so we can know what we need to know from lucenne fonctionning.

it's maybe not the doc themselves, and more the fact hibernate search is not widely used yet, so there are not that much blogs with use cases around to help.


Top
 Profile  
 
 Post subject: Re: language analyser class and annotations
PostPosted: Wed May 11, 2011 12:28 pm 
Beginner
Beginner

Joined: Mon Apr 04, 2011 12:08 pm
Posts: 32
Quote:
sorry I don't the analyzer you have defined makes much sense. it will first split words in 3-grams, then ascii-fold them- then transform these 3-grams in such a way in how they would be pronounced in english - if they where words at all! then these pronounciation traslitterations are analyzed using french grammar rules??
On this output you're removing words which have to match your stopfilter.. very unlikely you are removing what you expect.


Quote:
We have a tool in the testsuire source code; have a look at
org.hibernate.search.test.util.AnalyzerUtils

it's able to inspect how your analyzer output is going to look like.


There is no way to have this outputs in debug when indexing ?

It's not easy to setup tests if a database is needed behind.
Maybe I need to take some vacations, but it lacks a bit of docs and blogs around.


Top
 Profile  
 
 Post subject: Re: language analyser class and annotations
PostPosted: Wed May 11, 2011 1:48 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Quote:
It's not easy to setup tests if a database is needed behind.

if you had a look to the testsuite, you'd have seen that we test it without setting up a database; also AnalyzerUtils is a simple function calling some helper in Lucene.

Quote:
Maybe I need to take some vacations, but it lacks a bit of docs and blogs around.

there are tons of information about Hibernate Search, but you won't find anything about analyzers in Hibernate Search as there's nothing special about it, as we are just delegating to Lucene's standard analyzers. You could at least read the books about it: both Hibernate Search in Action and Lucene in Action have in depth explanations on how Analysis is performed.

Quote:
There is no way to have this outputs in debug when indexing ?

no, as that's Lucene's business and it doesn't use a logger. the hints I gave you make it trivial to write your own tests.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: language analyser class and annotations
PostPosted: Thu May 12, 2011 5:02 am 
Beginner
Beginner

Joined: Mon Apr 04, 2011 12:08 pm
Posts: 32
s.grinovero wrote:
if you had a look to the testsuite, you'd have seen that we test it without setting up a database; also AnalyzerUtils is a simple function calling some helper in Lucene.


Just for the info there are no sources jar on the jboss maven repository for hibernate-search-testing 3.4 final .
I will look elsewhere.

s.grinovero wrote:
there are tons of information about Hibernate Search, but you won't find anything about analyzers in Hibernate Search as there's nothing special about it, as we are just delegating to Lucene's standard analyzers.


Well I think it would be great that hibernate search doc delegates to lucene doc as well :)

s.grinovero wrote:
Quote:
There is no way to have this outputs in debug when indexing ?

no, as that's Lucene's business and it doesn't use a logger. the hints I gave you make it trivial to write your own tests.


thanks for the informations


Top
 Profile  
 
 Post subject: Re: language analyser class and annotations
PostPosted: Thu May 12, 2011 5:08 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
good points, thank you.
didn't know that the sources for testing where not being deployed, anyway the source code is available: http://www.hibernate.org/subprojects/search/sourcecode
contributions are welcome to the docs as well :)

I just proposed to move this utility from testing to the main jar so it's easier to use for everyone.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: language analyser class and annotations
PostPosted: Fri May 13, 2011 9:41 am 
Beginner
Beginner

Joined: Mon Apr 04, 2011 12:08 pm
Posts: 32
s.grinovero wrote:
good points, thank you.
didn't know that the sources for testing where not being deployed, anyway the source code is available: http://www.hibernate.org/subprojects/search/sourcecode
contributions are welcome to the docs as well :)

I just proposed to move this utility from testing to the main jar so it's easier to use for everyone.


yep otherwise the source jar can't be attached through maven


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 8 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.