-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 7 posts ] 
Author Message
 Post subject: Search: creating a master index and @IndexedEmbedded
PostPosted: Wed Apr 27, 2011 11:45 am 
Newbie

Joined: Wed Oct 31, 2007 6:04 am
Posts: 8
Location: Cologne, Germany
Hello everyone,

after some problems with the Compass search framework regarding the update of bi-directional associations in combination with JPA, I decided to give Hibernate Search a try and have set it up with some success.
There's still however one thing not working as expected yet: I want to provide my users with the possibility to search across all fields and entities using a simple "Google-style" free-form text input search field where they can either enter a string that will be searched for in all fields or enter a custom Lucene query for special cases.

To achieve this, I decided to introduce a special additional index called "ALL" on all searchable fields like this:

Code:
@Fields({@Field(name="ALL", index=Index.TOKENIZED), @Field(name="name", index=Index.TOKENIZED)})
private String name;


While this works fine for simple models, as soon as I want to use @IndexedEmbedded, I'd have to use prefix="" to get this to work as desired. But trying this, I stumbled upon http://opensource.atlassian.com/project ... SEARCH-183 which basically prohibits me from this approach.

Do you have any tips or pointers on how to approach this problem?


Top
 Profile  
 
 Post subject: Re: Search: creating a master index and @IndexedEmbedded
PostPosted: Wed Apr 27, 2011 1:21 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi, welcome.
ok I'll consider this as another vote for HSEARCH-183, it was never a priority as it seemed nobody would ever use an empty prefix.
I'll propose some alternatives, so we can decide basing on your feedback if we should actually check for it and throw an error, or need to accomodate it in some way; there's a tricky aspect in accomodating it, which is that people should be able to use different analyzers per property, so always use the same analyzer for the same field name or searching won't provide expected matches; if we allow duplicate field names coming from different models, then the analyzer uniqueness can't be guaranteed.

Using the ALL field as in your example you're actually making a copy of all data, which is now indexed/stored twice.
problems:
- slower updates
- ~twice the index size -> slower queries
- as mentioned above, you'll have to use the same analyzer consistently all the time, or you'll get inconsistent results; might seem easy to use always the same, but at some point you'll realize you need to tokenize some value differently and then you're stuck.

Assuming you use a QueryParser to parse the user input as free text, did you consider instead to actually parse the user query targeting each field?
It might look like as an inefficient query as the end result will have many clauses, but in fact those are very efficiently handled by Lucene (compare it to how it transforms a RangeQuery and you'll understand).

If you do that, you can store it once, keep annotations simpler, and be totally flexible on the analyzer choices.
The only downside I see from this approach is that you'll need a list of all field names; not really an issue as you can extract this information from the index with some lines of code.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Search: creating a master index and @IndexedEmbedded
PostPosted: Thu Apr 28, 2011 7:30 am 
Newbie

Joined: Wed Oct 31, 2007 6:04 am
Posts: 8
Location: Cologne, Germany
Hello Sanne,

first of all, thank you for the quick reply and insightful tips!

I had considered to construct a list of all fields available on all models and then use MultiFieldQueryParser to search across all of them. But then I supposed that searching could take much longer if all indices needed to be searched than if I created an additional index, so I decided to try the "ALL" index approach first.

I will now try using a MultiFieldQueryParser with all fields instead and see how it works for me. Where do you propose should I get information about indexed entities/fields? Directly via the Lucene API or from Hibernate Search? If the later, how would I best access the current configuration? After a short look through the API documentation, I found org.hibernate.search.cfg.EntityDescriptor which seems to contain all required information.

All the best
Marcus


Top
 Profile  
 
 Post subject: Re: Search: creating a master index and @IndexedEmbedded
PostPosted: Thu Apr 28, 2011 7:51 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Quote:
I had considered to construct a list of all fields available on all models and then use MultiFieldQueryParser to search across all of them. But then I supposed that searching could take much longer if all indices needed to be searched than if I created an additional index, so I decided to try the "ALL" index approach first.

Yes I understand that's what common sense suggests, but in practice there is no additional cost, or if there's any I could never measure it.

Quote:
I will now try using a MultiFieldQueryParser with all fields instead and see how it works for me. Where do you propose should I get information about indexed entities/fields? Directly via the Lucene API or from Hibernate Search? If the later, how would I best access the current configuration? After a short look through the API documentation, I found org.hibernate.search.cfg.EntityDescriptor which seems to contain all required information.

no, EntityDescriptor is not populated by the framework; that's used in case you want to define the mapping programmatically instead of using annotations; there's no code mapping the annotations to those value holders.

I didn't test it, but something like this should provide you all fieldnames defined in the index:
Code:
         SearchFactory searchFactory = fullTextEntityManager.getSearchFactory();
         DirectoryProvider[] directoryProviders = searchFactory.getDirectoryProviders( YourEntityType.class );
         IndexReader indexReader = searchFactory.getReaderProvider().openReader( directoryProviders );
         Set<String> fieldNames = new HashSet<String>();
         try {
            TermEnum termEnum = indexReader.terms();
            while ( termEnum.next() ) {
               Term term = termEnum.term();
               fieldNames.add( term.field() );
            }
         }
         finally {
            searchFactory.getReaderProvider().closeReader( indexReader );
         }

It will iterate the full index, so you might want to cache and reuse the result, but it shouldn't be that bad as performing a query does something similar, so Lucene can perform this at interesting efficiency.
Of course it won't find fieldnames which you have never written to the index, but there's nothing to find there either ;)

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Search: creating a master index and @IndexedEmbedded
PostPosted: Mon May 02, 2011 11:42 am 
Newbie

Joined: Wed Oct 31, 2007 6:04 am
Posts: 8
Location: Cologne, Germany
Hurray, it's working! Thank you for the detailed answer!
After having to divert my attention to a different project for a couple of days, I could finally come back to this problem.
Now I simply added a loop to iterate over all my "root" classes and store the fields in an Array for easy access during MultiFieldQueryParser construction and everything works as expected!

All the best
Marcus


Top
 Profile  
 
 Post subject: Re: Search: creating a master index and @IndexedEmbedded
PostPosted: Mon May 02, 2011 4:00 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
you're welcome, I'm glad it helped :)

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Search: creating a master index and @IndexedEmbedded
PostPosted: Thu May 12, 2011 6:00 am 
Newbie

Joined: Tue May 03, 2011 2:23 pm
Posts: 3
Hi Sanne,

I also implemented a google like search using MultiFieldQueryParser and want to place a note on this solution about performance (maybe it is an open door). I also wanted to dynamically generate the list of fields the MultiFieldQueryParser should use to search on but ended up with a list containing over 500 fields which caused the MultiFieldQueryParser to take at least 4 minutes to parse the query. So for indices with a limited number of fields this is working.

I am now using a predefined list of fields which are used for fulltext search, this has also the benefit that you exclude fields from fulltext search which could cause to show results which are correct but not expected.

Br,

Sarris


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 7 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.