search 3.3.0, mixing TOKENIZED and UN_TOKENIZED attributes

Simon.E · **Joined:** Mon Feb 14, 2011 7:51 am **Posts:** 4

I have a class that indexes its attributes to the same field name and mixes TOKENIZED and UN_TOKENIZED.
For example like this:

Code:

@Indexed
@Entity
public class SomeClass{
...
    @Field(name = "default", index = Index.TOKENIZED)
    private String foo;

    @Field(name = "default", index = Index.UN_TOKENIZED)
    private String bar;
...
}

This used to work fine until I upgraded to hibernate-search 3.3.0. Since then everything is untokenized unless I make "bar" tokenized as well.
Is this behavior intended?

Thx
Simon

sanne.grinovero · **Posted:** Mon Feb 14, 2011 1:32 pm

I don't think it's intended as I'd never think anybody would use two attributes stored into the same field with different options. Seems quite dangerous and error prone to handle, I'd recommend you to keep the fields separate and query on both of them.

What's the use case?

Simon.E · **Joined:** Mon Feb 14, 2011 7:51 am **Posts:** 4

The use case: I have a google-like full text search where the user can search for instances of different classes by just entering keywords without specifying field names. These classes don't have a common attribute set.
This is implemented by indexing all the relevant attributes from different classes into same index field "default". Some of these attributes shouldn't be tokenized because they are domain specific acronyms.
With hibernate-search 3.1.1.GA this was no problem.

sanne.grinovero · **Posted:** Mon Feb 14, 2011 2:56 pm

ok I understand the need for it across different classes. but in the same class, why reuse the same field?
I think it would have more sense to have a "field_tokenized" and a "field_untokenized", and then search on both fields using the different analyzers. The problem you have is that during Query constructions, the user input must be analyzed using the same analyzer of the field you are searching; I guess that with Search 3.1.1 your users couldn't search for the domain specific acronyms?

The quick and dirty workaround for you would be to have two fields in each entity - even if they are different - and to try matching user input on both.
The nice and optimal solution, would be to create a custom Analyzer which is able to treat your special acronyms correctly. For example in Lucene you have the SimpleAnalyzer which deals with basic things such as whitespace and lowercasing, and you also have the StandardAnalyzer which is able to recognize emails and such more complex elements which should not be tokenized.

Simon.E · **Joined:** Mon Feb 14, 2011 7:51 am **Posts:** 4

s.grinovero wrote:

ok I understand the need for it across different classes. but in the same class, why reuse the same field?

If I add a new attribute I can just add the @Field(name = "default") and it's searchable. Otherwise I would have a long list of fields to maintain for the query.

s.grinovero wrote:

I guess that with Search 3.1.1 your users couldn't search for the domain specific acronyms?

They could search for it, but maybe there were issues I hadn't noticed.

What about keeping this "default" field for all standard attributes and add special fields for attributes that require special analyzers? This way I could make sure user input and field indexing is handled by the same analyzer.

sanne.grinovero · **Posted:** Mon Feb 14, 2011 3:32 pm

Quote:

What about keeping this "default" field for all standard attributes and add special fields for attributes that require special analyzers? This way I could make sure user input and field indexing is handled by the same analyzer.

yes you need to do that. you definitely had issues before in proper matching of special terms.

Simon.E · **Joined:** Mon Feb 14, 2011 7:51 am **Posts:** 4

ok, thanks for your quick replies,
Simon