Hibernate Community • View topic - Analyzer problem wiht "

View unanswered posts | View active topics

Board index » Projects » Search, Validator, Shards

All times are UTC - 5 hours [ DST ]

Analyzer problem wiht "_"

Page 1 of 1

[ 3 posts ]

Previous topic | Next topic

Author

Message

PatPat

Post subject: Analyzer problem wiht "_"

Posted: Wed Mar 21, 2012 4:55 am

Newbie

Joined: Thu Nov 10, 2011 9:41 am
Posts: 2

Hi at all,

I ve got a little problem indexing mutliple fields containing names of projects, companies and of other organization units.
Those names can contain all chararcters, also underscores.
The fields are indexed using annotations. Hibernate search 3.4.1 and Lucene 3.1.0 is used.

So what I want is, that at every _ the fields are split into terms. I recognized, that if StandardAnalyzer VERSION.LUCENE_31 is
programatically used nothing is splitted. Therefore VERSION.LUCENE_30 analyzer splits in the same behaviour as the annoted values.
The annotated values are split if they are not containing any digits.

Input; expected value
asdf_asdf; asdf, asdf
asdf_333; asdf, 333

Is there any possibilty to get that stuff to work that I can get my expected result.
Would be great if anybody has a solution.

Thanks
Pat

Top

hardy.ferentschik

Post subject: Re: Analyzer problem wiht "_"

Posted: Wed Mar 21, 2012 5:39 am

Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden

Hi,

as of Lucene 3.1 the StandardAnalyzer uses a new version of StandardTokenizer which implements Unicode Standard Annex #29. The old version of the StandardTokenizer is now called ClassicTokenizer.

The now called ClassicTokenizer always treated tokens with numbers differently. In the documentation it says, eg: "Splits words at hyphens, unless there's a number in the token, in which case the whole token is interpreted as a product number and is not split. " I would assume that's the behavior you are seeing.

As a solution you can always create your own tokenizer, eg by starting with the code for ClassicTokenizer. Have a look at this thread as well - http://lucene.472066.n3.nabble.com/Inco ... 34767.html

--Hardy

Top

PatPat

Post subject: Re: Analyzer problem wiht "_"

Posted: Wed Mar 21, 2012 12:17 pm

Newbie

Joined: Thu Nov 10, 2011 9:41 am
Posts: 2

Hi,

thanks a lot.
But meanwhile I found another solution using WordDelimiterFilterFactory.

Pat

Top

Page 1 of 1

[ 3 posts ]

Board index » Projects » Search, Validator, Shards

All times are UTC - 5 hours [ DST ]

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum