Hibernate Community • View topic - StandardAnalyzer and Untokenised Fields

View unanswered posts | View active topics

Board index » Projects » Search, Validator, Shards

All times are UTC - 5 hours [ DST ]

StandardAnalyzer and Untokenised Fields

Page 1 of 1

[ 3 posts ]

Previous topic | Next topic

Author

Message

amin-mc

Post subject: StandardAnalyzer and Untokenised Fields

Posted: Wed Feb 18, 2009 8:06 am

Pro

Joined: Wed Oct 03, 2007 2:31 pm
Posts: 205

Hi

I have an object with fields that I have marked as untokenised (due to requirements as the should not be tokenised). As I understand no tokenised fields are stored in the index as they are inputted. When I run a query such as email:email I get no result (luke tells me the same result). This is obviously because in the index the token is 'EMAIL'.

I use the same analyzer as i use for indexing. What is the recommended approach for dealing this situation where the user may type in a word which may be lowercase but in the index the token is held as uppercase?

Cheers
Amin

Top

hardy.ferentschik

Post subject: Re: StandardAnalyzer and Untokenised Fields

Posted: Thu Feb 19, 2009 9:14 am

Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden

Hi,

Quote:

I have an object with fields that I have marked as untokenised (due to requirements as the should not be tokenised). As I understand no tokenised fields are stored in the index as they are inputted.

Yes, un-tokenized fields are indexed as they are. This does not mean that they are stored in the index. Indexing and keeping the actual indexed values are two orthogoanl concepts.

Quote:

I use the same analyzer as i use for indexing.

Given that you index all properties un-tokenized you are actually not using any Analyzer at indexing time. So if you are using an analyzer at search time you are actually applying modifications to the search terms which are not applied during indexing.

Quote:

What is the recommended approach for dealing this situation where the user may type in a word which may be lowercase but in the index the token is held as uppercase?

Either manually upper-/lowercase the data and the search terms, or use a custom analyzer which will return the whole field value as single token. Chain this "tokenizer" with a LowerCaseFilter.

--Hardy

Top

amin-mc

Post subject:

Posted: Sat Feb 21, 2009 3:28 pm

Pro

Joined: Wed Oct 03, 2007 2:31 pm
Posts: 205

Hi

Thanks for your reply. After digging around I found that I can use KeywordAnalyzer for un-tokenised fields (Lucene In Action). So wrote the following code for an un-tokenised field while applying the standard analyzer for all other fields (tokenised fields).

Code:

Analyzer entityScopedAnalyzer = searchFactory.getAnalyzer(PersonalContact.class);
                   PerFieldAnalyzerWrapper perFieldAnalyzerWrapper = new PerFieldAnalyzerWrapper(entityScopedAnalyzer);
                   perFieldAnalyzerWrapper.addAnalyzer("email", new KeywordAnalyzer());
                   QueryParser parser = new MultiFieldQueryParser(new String[] {"email", "addresses.address1"},perFieldAnalyzerWrapper);
                   

I know that the standard analyzer deals with emails but the above just depicts an example of using the KeywordAnalyzer.

Cheers
Amin

Top

Page 1 of 1

[ 3 posts ]

Board index » Projects » Search, Validator, Shards

All times are UTC - 5 hours [ DST ]

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum