Does Hibernate Search support search in 'All' field?

Hue · **Joined:** Wed Dec 12, 2007 11:02 pm **Posts:** 2

When working with Lucene, there is no way to search on all the fields, We must make a field aggregate other fields manually to support search in 'All' field. I want to know if Hibernate Search can do this automatically? Seems that Compass support this.

hardy.ferentschik · **Posted:** Thu Dec 13, 2007 3:35 am

Hue wrote:

When working with Lucene, there is no way to search on all the fields, We must make a field aggregate other fields manually to support search in 'All' field. I want to know if Hibernate Search can do this automatically? Seems that Compass support this.

There is no special functionality/parameter to tell Hibernate Search to search all fields.

In order to search you are constructing native Lucene queries (org.apache.lucene.search.Query) which get transparently wrapped around Hibernate queries (org.hibernate.Query) to offer you all the Hibernate specific features.

However, you could map each of your fields multiple times (http://www.hibernate.org/hib_docs/search/reference/en/html_single/#d0e1191).

Code:

    @Fields( {
            @Field(index = Index.TOKENIZED),
            @Field(name = "ALL", index = Index.TOKENIZED)
            } )
    public String getTitle() {
        return title;
    }

    @Fields( {
            @Field(index = Index.TOKENIZED),
            @Field(name = "ALL", index = Index.TOKENIZED)
            } )
    public String getSummary() {
        return summary;
    }

In the above example you would end up with the fields title, summary and ALL.

emmanuel · **Posted:** Sat Dec 15, 2007 9:42 pm

From my experience, indexing everything in one single field is naive, it does not reflect the reality of ranking/importance between 2 fields. In Hardy's example title is more important than ummary. You cannot express that in an all field.
Supporting an all feature in Hibernate Search would be trivial (as trivial as in native Lucene), but I think it's a misleading feature and probably the Lucene team thinks the same.

ialpert · **Joined:** Fri Dec 17, 2010 11:14 am **Posts:** 11

emmanuel wrote:

From my experience, indexing everything in one single field is naive, it does not reflect the reality of ranking/importance between 2 fields. In Hardy's example title is more important than ummary. You cannot express that in an all field.
Supporting an all feature in Hibernate Search would be trivial (as trivial as in native Lucene), but I think it's a misleading feature and probably the Lucene team thinks the same.

It seems like the lucene in action guys actually use this (From lucene in action 5.5):

Quote:

Generally speaking, querying on multiple fields isn’t the best practice for user-entered queries. More
commonly, all words you want searched are indexed into a
combining various fields. A synthetic
contents or keywords field by
contents field in our test environment uses this scheme to put
author and subjects together:
doc.add(new Field("contents", author + " " + subjects, Field.Store.NO,
Field.Index.ANALYZED));
We used a space ("
") between author and subjects to separate words for the analyzer. Allowing users
to enter text in the simplest manner possible without the need to qualify field names, generally makes
for a less confusing user experience.

The author does mention it's a test environment, I'm not sure the significance of that.
I've looked into doing this using the ALL method mentioned above; which works well until you deal with @IndexEmbedded

emmanuel · **Posted:** Wed Jun 01, 2011 9:15 am

Note that in Lucene in Action, they focus on ease of use by the search user. If your application (ie the search engine coded in your app and receiving the user query) selects the right fields and apply the right boost levels transparently from the user which still simply provides a set of raw words, everything is good in the best of possible worlds :)

ialpert · **Joined:** Fri Dec 17, 2010 11:14 am **Posts:** 11

emmanuel wrote:

Note that in Lucene in Action, they focus on ease of use by the search user. If your application (ie the search engine coded in your app and receiving the user query) selects the right fields and apply the right boost levels transparently from the user which still simply provides a set of raw words, everything is good in the best of possible worlds :)

Unfortunately for me, my users want to search all fields (because of some existing in house search technology does this), they (currently) don't care about ranking (because I'm imposing a sort).

I was hoping I could make use of ClassBrdige, just take the existing document it gets passed, and nab the field data (then I don't have to have specific bridges for each class).

If you have any suggestions as to how to achieve this they'd be greatly appreciated.

sanne.grinovero · **Posted:** Fri Jun 03, 2011 9:16 am

You have two viable options:
1 - write it all in the same field, and search on that
2 - write each on it's own field, and search on all of them

you can override the name of each field, so you can have it index all in the same field; you can also have each property be indexed in multiple fields, so in your case I would index each property both in it's own specific field and in the common one, so you can easily build more advanced features on the specific fields.

Keep in mind that when searching you (usually) want to apply the same analyzer on the user input that you did on the indexed text, so adding all on the same field implies that you have a single global analyzer, this must be consistent. That's why I would add them to both fields.

It's pretty easy to search on multiple fields:

Code:

QueryBuilder monthQb = fullTextSession.getSearchFactory()
            .buildQueryBuilder().forEntity( Month.class ).get();
query = monthQb.
      phrase()
      .onField( "mythology" )
      .andField( "aliasNames" )
      .andField( "saints" )
      .sentence( "Month whitening" )
      .createQuery();

Also you can extract the name of all fields directly from the index, I've shown an example for that in a previous thread (but you shouldn't need that).

ialpert · **Joined:** Fri Dec 17, 2010 11:14 am **Posts:** 11

The real acknowledged problem here is that we are trying to mimic an older system.

I've found searching on all fields (generating queries for lucene's query parser) to be quite slow (for one of our objects we have ~170 fields). Using the same name like:

Code:

@Fields( {
            @Field(index = Index.TOKENIZED),
            @Field(name = "ALL", index = Index.TOKENIZED)
            } )

Doesn't work for subobjects. To get around this I did this:

Throw a class bridge on the object i want indexed:

Code:

@ClassBridge(name = "ALL", index = Index.TOKENIZED, store = Store.YES, impl = AllBridge.class)

Make an AllBridge class (which uses the same concept as (http://community.jboss.org/wiki/Hiberna ... Extraction):

Code:

public class AllBridge<T extends Domain> implements FieldBridge, StringBridge {

   @Override
   public void set(String name, Object value, Document document, LuceneOptions luceneoptions) {
      if (value != null) {
         Domain domain = (Domain) value;
         LazyAllField lazyAllField = new LazyAllField(name, document, luceneoptions);
         // document.add(new Field(name, "foo bar biz bat", luceneoptions.getStore(), luceneoptions.getIndex(), luceneoptions
         // .getTermVector()));
         document.add(lazyAllField);
      }
   }

   @Override
   public String objectToString(Object incomingQuery) {
      return (String) incomingQuery;
   }

}

Finally the LazyAllField:

Code:

public class LazyAllField extends AbstractField implements Fieldable {
   public static TextExtractor extrator = new TextExtractor();
   private String content;
   private final Document document;

   public LazyAllField(String name, Document document, LuceneOptions luceneOptions) {
      super(name, luceneOptions.getStore(), luceneOptions.getIndex(), Field.TermVector.NO);
      this.document = document;
      // fundamental set: this instructs Lucene not to call the stringValue on field creation, but only when needed
      lazy = true;
   }

   public byte[] binaryValue() {
      return null;
   }

   public Reader readerValue() {
      return null;
   }

   public TokenStream tokenStreamValue() {
      return null;
   }

   public String stringValue() {
      if (content == null) {
         content = "";
         StringBuffer buffer = new StringBuffer();
         for (Fieldable fieldable : document.getFields()) {
            if (fieldable != this) {
               buffer.append(fieldable.stringValue());
               buffer.append(" ");
            }
         }
         content = buffer.toString();
      }
      return content;
   }
}

This isn't rigorously tested by me (so use at your own risk), but it works. It does take a massive amount of heap currently.

sanne.grinovero · **Posted:** Mon Jun 06, 2011 10:27 am

I don't understand why you're using the lazily loaded field. Is that for other reasons?

If you're fine with introducing a custom FieldBridge, you could do it simply ignoring the field name value in a single line of code:

Code:

public void set(String name, Object value, Document document, LuceneOptions luceneOptions) {
      String fieldValue = yourCustomObjectToString( value );
      luceneOptions.addFieldToDocument( "ALL", fieldValue, document );
}

ialpert · **Joined:** Fri Dec 17, 2010 11:14 am **Posts:** 11

Yeah your right, I'm only using the lazy bridge so that my attachment text extraction can continue to be (somewhat) lazy.

The custom field bridge (if i'm understanding you correctly), I still want to be able to search other fields individually in addition to the all field (i.e. ALL:foo +date:[20010101 TO 20020101]).

sanne.grinovero · **Posted:** Mon Jun 06, 2011 1:10 pm

ok, then you could do:

Code:

public void set(String name, Object value, Document document, LuceneOptions luceneOptions) {
      String fieldValue = yourCustomObjectToString( value );
      luceneOptions.addFieldToDocument( "ALL", fieldValue, document );
      luceneOptions.addFieldToDocument( name, fieldValue, document );
}

but this only works if you can apply such a fieldbridge on each of your types (might be tedious work to map all dates, ints, custom types).

ialpert · **Joined:** Fri Dec 17, 2010 11:14 am **Posts:** 11

Yeah, the class bridge means you somewhat keep your annotations down for each class a bit (It actually seems to be working quite well in temrs of speed).

By the way, thanks for the feedback.