Hibernate Community • View topic - HibernateSeach

View unanswered posts | View active topics

Board index » Projects » Search, Validator, Shards

All times are UTC - 5 hours [ DST ]

HibernateSeach - design query

Page 1 of 1

[ 6 posts ]

Previous topic | Next topic

Author

Message

Marx2

Post subject: HibernateSeach - design query

Posted: Wed Apr 09, 2008 4:36 am

Beginner

Joined: Thu Feb 28, 2008 4:58 am
Posts: 37

Hello
I don't know how to properly index my entities.

I have my own Analyzer (which tokenize and analyze text in polish language).

I would like to have possibility to
1)search for exact phrases in untokenized index (default analyzer do that)
2)search for words in tokenized index (my analyzer can do that)

@Field annotation has two things:
-index=Index.(UN_)TOKENIZED
-store=Store.YES

-If I use Index.TOKENIZED together with Store.YES does it mean that value is added to index twice (once tokenized, and once as it is)? If yes, how can I choose which form will be used in search?

-If no should I have two different indexes (one tokenized and second one untokenized), and annotate twice with @Field ? What for is then store.YES attribute?

Top

hardy.ferentschik

Post subject:

Posted: Wed Apr 09, 2008 8:37 am

Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden

Hi,

if I understand your problem correctly you should use the @Field annotation twice. Take this slightly modified example from the documentation (http://www.hibernate.org/hib_docs/search/reference/en/html_single/#d0e1209:

Code:

    @Fields( {
            @Field(index = Index.TOKENIZED),
            @Field(name = "summary_polnish", index = Index.TOKENIZED, analyzer = @Analyzer(impl = PolishAnalyzer.class)
)
            } )
    public String getSummary() {
        return summary;
    }

In this example your are ending up with two fields in the lucene document:
* summary which will be tokenized and analyzed using the StandardAnalyzer
* summary_polnish which will be tokenized and analyzed using the PolishAnalyzer

Note, in both cases I had to specify Index.TOKENIZED. If I want a analyzer to be applied you have to specify TOKENIZED. If you would choose UN_TOKENIZED no analyzer would be used.

Regarding Store.YES/NO - this values are orthogonal to the Index properties. If set to YES it just means that the actual values are kept in the index (which makes eg debugging easier, but at the same time the index size increases). There is no relation to whether or not a analyzer will be applied.

Last but not least you could consider using a PerFieldAnalyzerWrapper (check Lucene's API). This way you can use one single analyzer for the whole project which you can specify in the properties file. In the implementation of your custom per field analyzer you then can switch analyzer depending on field names.

Hope this helps,

--Hardy

Top

Marx2

Post subject:

Posted: Wed Apr 09, 2008 2:47 pm

Beginner

Joined: Thu Feb 28, 2008 4:58 am
Posts: 37

1)OK, @Fields works as expected.
Now I have a question. Let's say I set global analyzer per project (I did it in persistence.xml). Now I map property as above, but once UN_TOKENIZED, and second one TOKENIZED
When I search using TOKENIZED index, I use the same analyzer as global.
But how to search UN_TOKENIZED index? I tried using StandardAnalyzer and it works in some way, but I think I shouldn't use any analyzer for UN_TOKENIZED index

2)Next question: how to reindex all database? Let's say project didn't use Lucene, have many data in database and I need to reindex 100 tables.
For now I did a dirty hack - I check every file in source directory, make classforname, search for @Entity annotation in class, than search for @Indexed annotation and now I can reindex this class.
I feel there is easier way.

3)Is there simple method to search in any field? For now I use method as in point above, than parse every field in class looking for @Field or @Fields annotation. I keep in database list of indexed field for every indexed class and use this list as needed. But maybe there is some easier way to do that?

4)How to use complex syntax for Lucene queries? for example "field:value" ? Multifield parser need to pass list of fields, simple parser need one field to pass. Where should I place "field:value"?

Top

hardy.ferentschik

Post subject:

Posted: Thu Apr 10, 2008 2:46 am

Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden

Hi there,

I am not sure if I understand you correctly on all your questions, but I give it a go. Maybe things become clearer then.

Quote:

There is really only one index. There is no TOKENIZED vs. UN_TOKENIZED index. An Lucene index is basically a data structure which is build around the concept of documents which are in essence key value pairs. Each of your entities will be mapped to one document and each property annotated with @Field will create one key/value pair. If you are using @Fields, each @Field has its own key/value pair. Of course this is very simplistic. I recommend you to read more about Lucene either on the website or get hold of a copy of 'Lucene in Action'.

Quote:

2)Next question: how to reindex all database? Let's say project didn't use Lucene, have many data in database and I need to reindex 100 tables.
For now I did a dirty hack - I check every file in source directory, make classforname, search for @Entity annotation in class, than search for @Indexed annotation and now I can reindex this class.
I feel there is easier way.

http://www.hibernate.org/hib_docs/search/reference/en/html_single/#search-batchindex

Quote:

3)Is there simple method to search in any field? For now I use method as in point above, than parse every field in class looking for @Field or @Fields annotation. I keep in database list of indexed field for every indexed class and use this list as needed. But maybe there is some easier way to do that?
4)How to use complex syntax for Lucene queries? for example "field:value" ? Multifield parser need to pass list of fields, simple parser need one field to pass. Where should I place "field:value"?

Per default Lucene searches over all fields. If you don't use the "field:value" all fields will be searched. You will have to familiarize yourself with the Lucene query API. Hibernate Search really sits on top of things. When it comes to generating queries it is really Lucene which counts. Have a look for some examples. Best place to look - 'Lucene in Action' :)

--Hardy

Top

Marx2

Post subject:

Posted: Thu Apr 10, 2008 3:39 am

Beginner

Joined: Thu Feb 28, 2008 4:58 am
Posts: 37

1)I know there is one index. Let's say that in index there is tokenized field and untokenized field.

Code:

@Fields ({
  @Field(index=org.hibernate.search.annotations.Index.UN_TOKENIZED),
  @Field(name="name_tokenized",index=org.hibernate.search.annotations.Index.TOKENIZED,analyzer=@Analyzer(impl=PolishAnalyzer.class))
})

How to search on untokenized field? which analyzer to choose?

2)I know this doc, but in example there is Customer class. How do you know that Customer needs indexing? Let's say you have 500 entities and most of them needs indexing. You need to index classes with @Indexed attribute only, I don't think you suggest to search for them manually...

3)Maybe per default Lucene searches over all fields, but let's look at the code:

Code:

QueryParser parser = new MultiFieldQueryParser(fields, standardAnalyzer);

I have to pass fields, and this parameter cannot be null. What should I put there if I want to search over all fields?

4)Let's look at another piece of code:

Code:

String searchPattern = "title:/"Titanic/"" ;
org.apache.lucene.search.Query luceneQuery = parser.parse(searchPattern );

It doesn't work.
Anyway I don't understand, why I need to fill "fields" in QueryParser, when later I choose fields in searchPattern.

Top

sanne.grinovero

Post subject:

Posted: Thu Apr 10, 2008 4:11 am

Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun

Hi,
Hardy has given you excellent advice but I'll help a bit.

Quote:

1)How to search on untokenized field? which analyzer to choose?

You may search on it as a tokenized field, no difference in the query syntax. You probably don't want to use an analyzer, as the field has not been processed by one (being untokenized). You should open your index with luke, so you can see the fields and try some queries.

Quote:

2)You need to index classes with @Indexed attribute only, I don't think you suggest to search for them manually.

Hibernate Search will index the entities automatically as they are persisted and otherwise change. To test them we usually create an instance of the entity we are testing, persist it and then search for it.
You've made a good point in the case of many indexable entities and starting from an existing database, I'll think about that and recommend some improvement about this.

Quote:

3)Maybe per default Lucene searches over all fields, but let's look at the code

mmm I didn't know that, I don't know if thats true. I usually write my queries specifying a default field. Anyway this is a Lucene question, as Hardy said you should ask at the Lucene forum and/or read Lucene in Action.

4)

Quote:

"title:/"Titanic/""

should be

Code:

"title:Titanic"

I think, but this really depends on the analyzers and tokenizers you are using. Again, we are happy to help with H.Search but you should read more about Lucene.

regards,
Sanne

Top

Page 1 of 1

[ 6 posts ]

Board index » Projects » Search, Validator, Shards

All times are UTC - 5 hours [ DST ]

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum