-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 7 posts ] 
Author Message
 Post subject: search 3.3.0, mixing TOKENIZED and UN_TOKENIZED attributes
PostPosted: Mon Feb 14, 2011 8:15 am 
Newbie

Joined: Mon Feb 14, 2011 7:51 am
Posts: 4
I have a class that indexes its attributes to the same field name and mixes TOKENIZED and UN_TOKENIZED.
For example like this:

Code:
@Indexed
@Entity
public class SomeClass{
...
    @Field(name = "default", index = Index.TOKENIZED)
    private String foo;

    @Field(name = "default", index = Index.UN_TOKENIZED)
    private String bar;
...
}


This used to work fine until I upgraded to hibernate-search 3.3.0. Since then everything is untokenized unless I make "bar" tokenized as well.
Is this behavior intended?

Thx
Simon


Top
 Profile  
 
 Post subject: Re: search 3.3.0, mixing TOKENIZED and UN_TOKENIZED attributes
PostPosted: Mon Feb 14, 2011 1:32 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
I don't think it's intended as I'd never think anybody would use two attributes stored into the same field with different options. Seems quite dangerous and error prone to handle, I'd recommend you to keep the fields separate and query on both of them.

What's the use case?

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: search 3.3.0, mixing TOKENIZED and UN_TOKENIZED attributes
PostPosted: Mon Feb 14, 2011 2:42 pm 
Newbie

Joined: Mon Feb 14, 2011 7:51 am
Posts: 4
The use case: I have a google-like full text search where the user can search for instances of different classes by just entering keywords without specifying field names. These classes don't have a common attribute set.
This is implemented by indexing all the relevant attributes from different classes into same index field "default". Some of these attributes shouldn't be tokenized because they are domain specific acronyms.
With hibernate-search 3.1.1.GA this was no problem.


Top
 Profile  
 
 Post subject: Re: search 3.3.0, mixing TOKENIZED and UN_TOKENIZED attributes
PostPosted: Mon Feb 14, 2011 2:56 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
ok I understand the need for it across different classes. but in the same class, why reuse the same field?
I think it would have more sense to have a "field_tokenized" and a "field_untokenized", and then search on both fields using the different analyzers. The problem you have is that during Query constructions, the user input must be analyzed using the same analyzer of the field you are searching; I guess that with Search 3.1.1 your users couldn't search for the domain specific acronyms?

The quick and dirty workaround for you would be to have two fields in each entity - even if they are different - and to try matching user input on both.
The nice and optimal solution, would be to create a custom Analyzer which is able to treat your special acronyms correctly. For example in Lucene you have the SimpleAnalyzer which deals with basic things such as whitespace and lowercasing, and you also have the StandardAnalyzer which is able to recognize emails and such more complex elements which should not be tokenized.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: search 3.3.0, mixing TOKENIZED and UN_TOKENIZED attributes
PostPosted: Mon Feb 14, 2011 3:28 pm 
Newbie

Joined: Mon Feb 14, 2011 7:51 am
Posts: 4
s.grinovero wrote:
ok I understand the need for it across different classes. but in the same class, why reuse the same field?

If I add a new attribute I can just add the @Field(name = "default") and it's searchable. Otherwise I would have a long list of fields to maintain for the query.

s.grinovero wrote:
I guess that with Search 3.1.1 your users couldn't search for the domain specific acronyms?

They could search for it, but maybe there were issues I hadn't noticed.

What about keeping this "default" field for all standard attributes and add special fields for attributes that require special analyzers? This way I could make sure user input and field indexing is handled by the same analyzer.


Last edited by Simon.E on Mon Feb 14, 2011 5:39 pm, edited 1 time in total.

Top
 Profile  
 
 Post subject: Re: search 3.3.0, mixing TOKENIZED and UN_TOKENIZED attributes
PostPosted: Mon Feb 14, 2011 3:32 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Quote:
What about keeping this "default" field for all standard attributes and add special fields for attributes that require special analyzers? This way I could make sure user input and field indexing is handled by the same analyzer.

yes you need to do that. you definitely had issues before in proper matching of special terms.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: search 3.3.0, mixing TOKENIZED and UN_TOKENIZED attributes
PostPosted: Tue Feb 15, 2011 6:30 am 
Newbie

Joined: Mon Feb 14, 2011 7:51 am
Posts: 4
ok, thanks for your quick replies,
Simon


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 7 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.