-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 3 posts ] 
Author Message
 Post subject: StandardAnalyzer and Untokenised Fields
PostPosted: Wed Feb 18, 2009 8:06 am 
Pro
Pro

Joined: Wed Oct 03, 2007 2:31 pm
Posts: 205
Hi

I have an object with fields that I have marked as untokenised (due to requirements as the should not be tokenised). As I understand no tokenised fields are stored in the index as they are inputted. When I run a query such as email:email I get no result (luke tells me the same result). This is obviously because in the index the token is 'EMAIL'.

I use the same analyzer as i use for indexing. What is the recommended approach for dealing this situation where the user may type in a word which may be lowercase but in the index the token is held as uppercase?


Cheers
Amin


Top
 Profile  
 
 Post subject: Re: StandardAnalyzer and Untokenised Fields
PostPosted: Thu Feb 19, 2009 9:14 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Hi,

Quote:
I have an object with fields that I have marked as untokenised (due to requirements as the should not be tokenised). As I understand no tokenised fields are stored in the index as they are inputted.

Yes, un-tokenized fields are indexed as they are. This does not mean that they are stored in the index. Indexing and keeping the actual indexed values are two orthogoanl concepts.

Quote:
I use the same analyzer as i use for indexing.

Given that you index all properties un-tokenized you are actually not using any Analyzer at indexing time. So if you are using an analyzer at search time you are actually applying modifications to the search terms which are not applied during indexing.

Quote:
What is the recommended approach for dealing this situation where the user may type in a word which may be lowercase but in the index the token is held as uppercase?

Either manually upper-/lowercase the data and the search terms, or use a custom analyzer which will return the whole field value as single token. Chain this "tokenizer" with a LowerCaseFilter.

--Hardy


Top
 Profile  
 
 Post subject:
PostPosted: Sat Feb 21, 2009 3:28 pm 
Pro
Pro

Joined: Wed Oct 03, 2007 2:31 pm
Posts: 205
Hi

Thanks for your reply. After digging around I found that I can use KeywordAnalyzer for un-tokenised fields (Lucene In Action). So wrote the following code for an un-tokenised field while applying the standard analyzer for all other fields (tokenised fields).

Code:
Analyzer entityScopedAnalyzer = searchFactory.getAnalyzer(PersonalContact.class);
                   PerFieldAnalyzerWrapper perFieldAnalyzerWrapper = new PerFieldAnalyzerWrapper(entityScopedAnalyzer);
                   perFieldAnalyzerWrapper.addAnalyzer("email", new KeywordAnalyzer());
                   QueryParser parser = new MultiFieldQueryParser(new String[] {"email", "addresses.address1"},perFieldAnalyzerWrapper);
                  


I know that the standard analyzer deals with emails but the above just depicts an example of using the KeywordAnalyzer.

Cheers
Amin


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 3 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.